[ 
https://issues.apache.org/jira/browse/WHIRR-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103312#comment-13103312
 ] 

Tom White commented on WHIRR-384:
---------------------------------

This looks great! Thanks for submitting it Frank.

* How does Mahout find the Hadoop cluster? Is there some extra configuration 
step needed? (Especially if you install on a node where Hadoop isn't installed.)
* How about calling the role "mahout-client"? I used a similar term over at 
https://github.com/tomwhite/whirr-scm for the client installation.
* Why do you need to install the Mahout examples JAR on the cluster at all? I 
would think you can submit it using "hadoop jar". Either way, this could be a 
follow on issue. We'd probably have to add something to Hadoop to allow extra 
JARs to be installed.
* Are all the dependencies needed? E.g. I can't see where jsch is used. mvn 
dependency:analyze should help here.

> Add Mahout as a service
> -----------------------
>
>                 Key: WHIRR-384
>                 URL: https://issues.apache.org/jira/browse/WHIRR-384
>             Project: Whirr
>          Issue Type: New Feature
>          Components: new service
>    Affects Versions: 0.7.0
>            Reporter: Frank Scholten
>             Fix For: 0.7.0
>
>         Attachments: WHIRR-384-mahout-home.patch
>
>
> Here is an initial patch to support Mahout as a Whirr service.
> I created the role 'mahout-home' which can be used to install the binary 
> Mahout distribution on a Hadoop namenode.
> By combining this role with configuration for a Hadoop cluster you can SSH 
> into the namenode, su to root and start running Mahout jobs via the mahout 
> script immediately.
> The 'mahout-home' role has two properties
> Mahout version                                        whirr.mahout.version 
> URL of the Mahout binary distribution tarball whirr.mahout.tarball.url
> Note that I used a snapshot version of Mahout for testing, revision 1169784, 
> because there were some problems with the Mahout script in 0.5 that have been 
> fixed on trunk, see MAHOUT-680. To test you can set the tarball property to 
> this link 
> http://dl.dropbox.com/u/13436484/mahout-distribution-0.6-SNAPSHOT.tar.gz
> I used configure actions and the onBeforeConfigure(). If there is a better 
> way to express this with the Whirr API let me know.
> Currently I am investigating a 'mahout-jar' role, which installs the Mahout 
> examples job jar under $HADOOP_HOME/lib on a tasktracer node. I already have 
> some code for putting the jar in place but when running a job from my local 
> machine I still get ClassNotFoundExceptions. I believe this is because Hadoop 
> has already started before the jar is put in the lib dir, so the jar won't be 
> picked up, but I have to investigate some more. From WHIRR-221 I understood 
> that there is no support (yet?) for ordering of services but if you have an 
> idea on how to fix this let me know.
> Comments and suggestions welcome!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to