I still have (as I recall) a thousand dollars' worth of AWS credit the AWS
team gave me specifically for Mahout testing, and we could run stuff on EMR
very easily.

Need to dig up the account number or details and see about sharing around
the credentials somehow.


On Sun, Mar 23, 2014 at 5:39 PM, Pat Ferrel (JIRA) <[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944673#comment-13944673]
>
> Pat Ferrel commented on MAHOUT-1464:
> ------------------------------------
>
> Adding 16 cores to my closet's cluster next week. Is there a 'large'
> dataset you have in mind? I have one with 4000 rows, 75,000 columns and
> 700,000 values but that seems smallish. Can't say when I'll get to it but
> it's on my list. If someone can jump in quicker--have at it.
>
> > RowSimilarityJob on Spark
> > -------------------------
> >
> >                 Key: MAHOUT-1464
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Collaborative Filtering
> >    Affects Versions: 0.9
> >         Environment: hadoop, spark
> >            Reporter: Pat Ferrel
> >              Labels: performance
> >             Fix For: 1.0
> >
> >         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch
> >
> >
> > Create a version of RowSimilarityJob that runs on Spark. Ssc has a
> prototype here: https://gist.github.com/sscdotopen/8314254. This should
> be compatible with Mahout Spark DRM DSL so a DRM can be used as input.
> > Ideally this would extend to cover MAHOUT-1422 which is a feature
> request for RSJ on two inputs to calculate the similarity of rows of one
> DRM with those of another. This cross-similarity has several applications
> including cross-action recommendations.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>

Reply via email to