[ 
https://issues.apache.org/jira/browse/MAHOUT-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385977#comment-14385977
 ] 

Pat Ferrel commented on MAHOUT-1655:
------------------------------------

Ah, ok. Thanks, I'd never have figured that out.

Just to get this on the table: Mahout's spark-shell doesn't need it AFAIK. 
mahout-math-scala does but waits until building the mahout-spark module to 
create a mahout-spark...dependency-reduced.jar that is passed to the context 
(along with the non transitive dependency mahout jars) when it's created. This 
gets it to the workers where it is used. Does this work for running jobs on 
Spark?

For mapreduce it will go into the mr...job.jar where it will be passed to 
hadoop mapreduce.

There is so much confusion over this, in my mind anyhow. Do you think this is 
the right thing to do?

So every module will use 14.0.1?


> Refactor module dependencies
> ----------------------------
>
>                 Key: MAHOUT-1655
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1655
>             Project: Mahout
>          Issue Type: Improvement
>          Components: mrlegacy
>    Affects Versions: 0.9
>            Reporter: Pat Ferrel
>            Assignee: Andrew Musselman
>            Priority: Critical
>             Fix For: 0.10.0
>
>
> Make a new module, call it mahout-hadoop. Move anything there that is 
> currently in mrlegacy but used in math-scala or spark. Remove dependencies on 
> mrlegacy altogether if possible by using other core classes.
> The goal is to have math-scala and spark module depend on math, and a small 
> module called mahout-hadoop (much smaller than mrlegacy). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to