[ 
https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576885#action_12576885
 ] 

Isabel Drost commented on MAHOUT-4:
-----------------------------------

Your plan of first trying to understand the non-distributed version and then 
map-reducing the algorithm sounds great :) Some comments from my point of view:

Maybe you might want to chose more verbose variable names than u, s and z and 
provide the mapping to the names used in the paper in a comment. Should make it 
easier for the reader of your code to distinguish users, stories and clusters 
(z).

I think you might want to inline() the initialize method. For me personally 
this makes it easier to follow what is done in the constructors. As for the 
default constructor, you could simply delegate initialization to PLSI_engine(u, 
s, z) by giving the default values for initialization.

Concerning the method calculate P_z_u_s - how many cluster numbers do you 
expect? It seems like this computation could become numerically unstable in 
case of very large numbers of clusters.

It would be nice if you could provide some unit tests to prove that your code 
is working correctly.

I know EM as a rather general principle - your implementation seems rather 
focussed on the setup of the google news clustering solution. I was wondering, 
whether it would be possible to generalize the implementation a little but 
still support the new personalization use case? Maybe others would like to 
reuse a general EM framework but not the exact same formulas that you used. 
Don't know whether that is possible and whether it can be done in a way that is 
easy to read....

> Simple prototype for Expectation Maximization (EM)
> --------------------------------------------------
>
>                 Key: MAHOUT-4
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-4
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ankur
>         Attachments: Mahout_EM.patch
>
>
> Create a simple prototype implementing Expectation Maximization - EM that 
> demonstrates the algorithm functionality given a set of (user, click-url) 
> data.
> The prototype should be functionally complete and should serve as a basis for 
> the Map-Reduce version of the EM algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to