On Tue, Apr 15, 2014 at 10:58 AM, Pat Ferrel <[email protected]> wrote:
> Sorry you are sick. Thanks for the tip. Spark has a client launcher method > "spark-class …Client launch ..." but I’m not having much success with that. > This will not work because you need Mahout's classpath too. And Spark's. The complexity here is the damn jar dependencies. Anything Spark (or hadoop for that matter, too) CLI do is assume that application is so simple it can fit into single jar and will have 0 external dependencies. I can do my own rant about it for ages. So. the task here is to collect all Spark jars and its dependencies; merge that of the same of Mahout's, perhaps filtering in only what is really needed in spark-based pipelines, and then run it. It is what specialized mahoutContext() api does, and there's a crapload of scala code devoted just to this single issue of deducing and grabing dependencies and make sure Spark takes them. Hope this clarifies why Spark helpers' ways of starting standalone spark applications just are not helpful for us. (or anyone, to be frank. I participated in a healhful dozen of spark-based projects, and none of them could use these helpers like Client or spark-class.sh for the same reason -- they had to do their own bootstrap routine). So... we will have to have our own helpers to do that . I wonder if there's a similar syntax for mahout already, something like "mahout run-class <class-name>". Since i never used that, i don't know for sure, but hadoop subordinate projects all usually have that (e.g. there's an 'hbase <class-name>" to run any class in hbase code base with proper classpath dependencies taken care of). > > As to the statement "There is not, nor do i think there will be a way to > run this stuff with CLI” seems unduly misleading. Really, does anyone > second this? > > There will be Scala scripts to drive this stuff and yes even from the CLI. > Do you imagine that every Mahout USER will be a Scala + Mahout DSL > programmer? That may be fine for commiters but users will be PHP devs, Ruby > devs, Python or Java devs maybe even a few C# devs. I think you are > confusing Mahout DEVS with USERS. Few users are R devs moving into > production work, they are production engineers moving into ML who want a > blackbox. They will need a language agnostic way to drive Mahout. Making > statements like this only confuse potential users and drive them away to no > purpose. I’m happy for the nascent Mahout-Scala shell, but it’s not in the > typical user’s world view. > > Sorry, end-of-rant. > > On Apr 15, 2014, at 10:14 AM, Dmitriy Lyubimov (JIRA) <[email protected]> > wrote: > > > [ > https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969763#comment-13969763] > > Dmitriy Lyubimov commented on MAHOUT-1464: > ------------------------------------------ > > [My] Silence idicates I've been pretty sick :) > > I thought i explained in my email we are not planning CLI. We are planning > script shell instead. There is not, nor do i think there will be a way to > run this stuff with CLI, just like there's no way to invoke a particular > method in R without writing a short script. > > That said, yes, you can try to run it as a java application, i.e. > [java|scala] -cp <cp>. <class name> > > where -cp is what `mahout classpath` returns. > > > Cooccurrence Analysis on Spark > > ------------------------------ > > > > Key: MAHOUT-1464 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1464 > > Project: Mahout > > Issue Type: Improvement > > Components: Collaborative Filtering > > Environment: hadoop, spark > > Reporter: Pat Ferrel > > Assignee: Sebastian Schelter > > Fix For: 1.0 > > > > Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, > MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, > run-spark-xrsj.sh > > > > > > Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) > that runs on Spark. This should be compatible with Mahout Spark DRM DSL so > a DRM can be used as input. > > Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence > has several applications including cross-action recommendations. > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) > >
