On Wed, Mar 26, 2014 at 6:00 AM, Hardik Pandya <smarty.ju...@gmail.com>wrote:

> Sorry to hijack the thread,
>
> this seems like first steps of mahout geeting it to work on spark
>
> there are similar efforts going on with R+Spark aka Spark R
>

Yeah. And there's rmr and i wrote a very similar thing, CrunchR (R for
Crunch) a year and a half ago.
Main problem imo with that as it turns out is that in most general case one
needs 1000 machines to do the job of 30. (aside from other general
criticisms of R).


>
> not sure if this helpos, played with spark ec2 scripts and it brings up
> multinode cluster using mesos and its configurable - willing to contribute
> donations for mahout-dev
>

That would be awesome

>
>
>
>
>
> On Sun, Mar 23, 2014 at 11:22 PM, Saikat Kanjilal (JIRA) <j...@apache.org
> >wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944710#comment-13944710
> ]
> >
> > Saikat Kanjilal commented on MAHOUT-1464:
> > -----------------------------------------
> >
> > +1 on Andrew's suggestion on using AWS to do this.  Andrew is it possible
> > to have a shared account so mahout contributors can use this, I 'd even
> be
> > willing to chip in donations :) to have a shared AWS account
> >
> > > RowSimilarityJob on Spark
> > > -------------------------
> > >
> > >                 Key: MAHOUT-1464
> > >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> > >             Project: Mahout
> > >          Issue Type: Improvement
> > >          Components: Collaborative Filtering
> > >    Affects Versions: 0.9
> > >         Environment: hadoop, spark
> > >            Reporter: Pat Ferrel
> > >              Labels: performance
> > >             Fix For: 1.0
> > >
> > >         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch,
> > MAHOUT-1464.patch
> > >
> > >
> > > Create a version of RowSimilarityJob that runs on Spark. Ssc has a
> > prototype here: https://gist.github.com/sscdotopen/8314254. This should
> > be compatible with Mahout Spark DRM DSL so a DRM can be used as input.
> > > Ideally this would extend to cover MAHOUT-1422 which is a feature
> > request for RSJ on two inputs to calculate the similarity of rows of one
> > DRM with those of another. This cross-similarity has several applications
> > including cross-action recommendations.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.2#6252)
> >
>

Reply via email to