Sorry to hijack the thread, this seems like first steps of mahout geeting it to work on spark
there are similar efforts going on with R+Spark aka Spark R not sure if this helpos, played with spark ec2 scripts and it brings up multinode cluster using mesos and its configurable - willing to contribute donations for mahout-dev On Sun, Mar 23, 2014 at 11:22 PM, Saikat Kanjilal (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944710#comment-13944710] > > Saikat Kanjilal commented on MAHOUT-1464: > ----------------------------------------- > > +1 on Andrew's suggestion on using AWS to do this. Andrew is it possible > to have a shared account so mahout contributors can use this, I 'd even be > willing to chip in donations :) to have a shared AWS account > > > RowSimilarityJob on Spark > > ------------------------- > > > > Key: MAHOUT-1464 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1464 > > Project: Mahout > > Issue Type: Improvement > > Components: Collaborative Filtering > > Affects Versions: 0.9 > > Environment: hadoop, spark > > Reporter: Pat Ferrel > > Labels: performance > > Fix For: 1.0 > > > > Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, > MAHOUT-1464.patch > > > > > > Create a version of RowSimilarityJob that runs on Spark. Ssc has a > prototype here: https://gist.github.com/sscdotopen/8314254. This should > be compatible with Mahout Spark DRM DSL so a DRM can be used as input. > > Ideally this would extend to cover MAHOUT-1422 which is a feature > request for RSJ on two inputs to calculate the similarity of rows of one > DRM with those of another. This cross-similarity has several applications > including cross-action recommendations. > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >
