[ 
https://issues.apache.org/jira/browse/MAHOUT-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-657:
--------------------------------------

    Attachment: MAHOUT-657.patch

> Sample code to apply SVD to the KDD data
> ----------------------------------------
>
>                 Key: MAHOUT-657
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-657
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.5
>
>         Attachments: MAHOUT-657.patch
>
>
> I was incited by some comments on twitter to make our SVD-based 
> recommendation code work on the KDD data. Here's the results so far:
> The patch contains a tweaked version of ExpectationMaximizationSVDFactorizer 
> (org.apache.mahout.cf.taste.example.kddcup.track1.svd.ParallelArraysSGDFactorizer)
>  in the examples module, that is able to load and process the KDD dataset 
> with a constant memory usage of approximately 7 gb (by using primitive arrays 
> for everything). 
> It's still very slow unfortunately, a factorization using 40 features and 25 
> iterations took 10 hours on my desktop PC. As far as I understand the math 
> behind it, the algorithm is not parallelizable but maybe someone might be 
> able to improve my implementation or make it compute several factorizations 
> at once.
> I took a wild guess on the parameters and got an RMSE of 23.35 to the 
> validation set and and RMSE of 26.1287 to the secret test ratings (that's 
> rank 63 by the time of this writing).
> Would love to see people play with this code and improve it!
> In order to use this, have a look at the parameters in 
> *org.apache.mahout.cf.taste.example.kddcup.track1.svd.Track1SVDRunner*, 
> change them as you see fit and run that class with the path to the kdd data 
> directory and the path to the file you wanna have the results stored in as 
> arguments. In my tests I used *-Xms6700M -Xmx6700M* to give the JVM enough 
> memory for 40 features.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to