[
https://issues.apache.org/jira/browse/MAHOUT-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028251#comment-13028251
]
Vasil Vasilev commented on MAHOUT-684:
--------------------------------------
Hi Jake,
What statics do you mean? If you are talking about writeNewAlpha and
calculateAlpha I took as example the rest of the methods that are available in
LDADriver. We can for sure find a way to refactor this but it will take more
time. If you are talking about the fact that I made the calculation of Digamma
function static, this is because I think of it as a kind of utility function.
One option is to extract it is separate class.
About the heavy I/O - you are right: In fact createState() loads the old state
(which contains the old values for alpha). What we could potentially do is
write the old alpha in the newly generated state in LDAMapper and then preserve
it in LDAReducer. Then in writeNewAlpha we will be able to take the old alpha
and digamma with single read of the data.
Didn't get the idea about multiple outputs: can you elaborate on this or paste
a link?
Sure, I can help you. When you checkin I will update the patch in accordance
to your changes.
About the LL: Yes, I ran it based on build-reuters.sh, but with modified
version of seq2sparse which prunes the words with high DF. Unfortunately I
haven't got time yet to create a patch for this change.
> Topics regularization for LDA
> -----------------------------
>
> Key: MAHOUT-684
> URL: https://issues.apache.org/jira/browse/MAHOUT-684
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Reporter: Vasil Vasilev
> Priority: Minor
> Labels: LDA.
> Attachments: MAHOUT-684.patch
>
>
> Implementation provided for the alpha parameters estimation as described in
> the paper of Blei, Ng and Jordan
> (http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf).
> Remark: there is a mistake in the last formula in A.4.2 (the signs are
> wrong). The correct version is described here:
> http://www.cs.cmu.edu/~jch1/research/dirichlet/dirichlet.pdf (page 6).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira