Indeed, the wiki is pretty out of date in some areas and the actual apis have changed (since 2008!). For users wishing to launch clustering jobs using trunk I suggest checking out utils TestCDbwEvaluator and TestClusterDumper which employ the latest versions. These do not use the command-line for execution but use the runJob methods in the respective driver classes. Once I get my wiki karma back I will go over them all again and update for consistency.

On 5/2/10 12:42 PM, Sisir Koppaka wrote:
For GSOC students,
In case anyone was going through the code and finding some difficulty in
running stuff, I have updated the kMeans page on the
wiki<https://cwiki.apache.org/confluence/display/MAHOUT/k-Means>  with
a short quickstart shell script that will run it for you. You can tweak the
settings and reuse it. Reading the code after running it will hopefully help
out in understanding the codebase well.

If any of you have any tips to share, or have made notes of
quirks-to-be-aware-of, do post them here for everyone's benefit.


Reply via email to