[
https://issues.apache.org/jira/browse/MAHOUT-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on MAHOUT-657 started by Sebastian Schelter.
> Sample code to apply SVD to the KDD data
> ----------------------------------------
>
> Key: MAHOUT-657
> URL: https://issues.apache.org/jira/browse/MAHOUT-657
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 0.5
>
> Attachments: MAHOUT-657.patch
>
>
> I was incited by some comments on twitter to make our SVD-based
> recommendation code work on the KDD data. Here's the results so far:
> The patch contains a tweaked version of ExpectationMaximizationSVDFactorizer
> (org.apache.mahout.cf.taste.example.kddcup.track1.svd.ParallelArraysSGDFactorizer)
> in the examples module, that is able to load and process the KDD dataset
> with a constant memory usage of approximately 7 gb (by using primitive arrays
> for everything).
> It's still very slow unfortunately, a factorization using 40 features and 25
> iterations took 10 hours on my desktop PC. As far as I understand the math
> behind it, the algorithm is not parallelizable but maybe someone might be
> able to improve my implementation or make it compute several factorizations
> at once.
> I took a wild guess on the parameters and got an RMSE of 23.35 to the
> validation set and and RMSE of 26.1287 to the secret test ratings (that's
> rank 63 by the time of this writing).
> Would love to see people play with this code and improve it!
> In order to use this, have a look at the parameters in
> *org.apache.mahout.cf.taste.example.kddcup.track1.svd.Track1SVDRunner*,
> change them as you see fit and run that class with the path to the kdd data
> directory and the path to the file you wanna have the results stored in as
> arguments. In my tests I used *-Xms6700M -Xmx6700M* to give the JVM enough
> memory for 40 features.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira