[
https://issues.apache.org/jira/browse/MAHOUT-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe Prasanna Kumar updated MAHOUT-520:
--------------------------------------
Attachment: MAHOUT-520-syntheticcontrol.patch
The attached patch contains a script for running the clustering algos on
synthetic control data.
Script runs in 2 mode
1. default is the interactive mode -> from MAHOUT_HOME directory,
examples/bin/build-cluster-syntheticcontrol.sh
2. non-interactive mode -> examples/bin/build-cluster-syntheticcontrol.sh -ni .
this mode can be used by hudson script for automated testing
The script
1. checks if HADOOP_HOME is set, if not throws error and halts
2. checks health of dfs by invoking $HADOOP_HOME/bin/hadoop fs -ls. If not
healthy, throws error and halts
3. uploads synthetic_control.data to hdfs
4. checks with user on which clustering algo they'd want to use.
5. User chooses a # and the corresponding algo is executed.
I have tested the scenarios failure and success scenarios from my end. If
someone also want to verify, that'll be wonderful.
regards
Joe.
> Add example scripts / integration tests for various algorithms.
> ---------------------------------------------------------------
>
> Key: MAHOUT-520
> URL: https://issues.apache.org/jira/browse/MAHOUT-520
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Affects Versions: 0.4
> Reporter: Drew Farris
> Assignee: Drew Farris
> Priority: Minor
> Attachments: MAHOUT-520-syntheticcontrol.patch, MAHOUT-520.patch
>
>
> Scripts like build-reuters.sh are useful in that they both demonstrate
> typical usage of Mahout from the command-line but also serve as integration
> tests. We should add additional scripts that drive the algorithms so new
> users can quickly run the examples.
> Perhaps these can also be run from hudson as a part of the nightly builds and
> can serve as integration tests.
> As a start towards this goal, provide build-20news-bayes.sh example (in the
> same vein as build-reuters.sh, that follows
> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.