[jira] [Comment Edited] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh
[ https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878304#comment-13878304 ] Suneel Marthi edited comment on MAHOUT-1402 at 1/22/14 8:57 AM: The MR version of Streaming KMeans seems to be failing (the sequential mode passes), the reason being that the reducer is reading zero intermediate centroids; need to investigate as to what's going on. was (Author: smarthi): The MR version of Streaming KMeans seems to be failing (the sequential mode passes), the reason being that the reducer is reading zero centroids from the mappers; need to investigate as to what's going on. Zero clusters using streaming k-means option in cluster-reuters.sh -- Key: MAHOUT-1402 URL: https://issues.apache.org/jira/browse/MAHOUT-1402 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.8 Environment: AWS default Linux AMI Reporter: Andrew Musselman Assignee: Suneel Marthi Fix For: 0.9 Running cluster-reuters.sh in examples/bin results in this: [snip] INFO: Number of Centroids: 0 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local23982482_0001 java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:120) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) [snip] WARNING: No qualcluster.props found on classpath, will use command-line arguments only Num clusters: 0; maxDistance: 0.00 [Dunn Index] First: Infinity [Davies-Bouldin Index] First: NaN Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 535 ms (Minutes: 0.008916) cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh
[ https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878425#comment-13878425 ] Suneel Marthi commented on MAHOUT-1402: --- The issue happens when the ReduceStreamingKMeans (-rskm) flag is set to True. Setting this to False has resolved the issue. We are lacking adequate code coverage tests for Streaming KMeans when the -rskm flag is set. This is something that needs to be fixed post 0.9 Release. Zero clusters using streaming k-means option in cluster-reuters.sh -- Key: MAHOUT-1402 URL: https://issues.apache.org/jira/browse/MAHOUT-1402 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.8 Environment: AWS default Linux AMI Reporter: Andrew Musselman Assignee: Suneel Marthi Fix For: 0.9 Running cluster-reuters.sh in examples/bin results in this: [snip] INFO: Number of Centroids: 0 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local23982482_0001 java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:120) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) [snip] WARNING: No qualcluster.props found on classpath, will use command-line arguments only Num clusters: 0; maxDistance: 0.00 [Dunn Index] First: Infinity [Davies-Bouldin Index] First: NaN Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 535 ms (Minutes: 0.008916) cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh
[ https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved MAHOUT-1402. --- Resolution: Fixed Removed the -rskm flag from the script to fix this, will have to investigate the implementation behavior when -rskm flag is specified for the next release. Zero clusters using streaming k-means option in cluster-reuters.sh -- Key: MAHOUT-1402 URL: https://issues.apache.org/jira/browse/MAHOUT-1402 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.8 Environment: AWS default Linux AMI Reporter: Andrew Musselman Assignee: Suneel Marthi Fix For: 0.9 Running cluster-reuters.sh in examples/bin results in this: [snip] INFO: Number of Centroids: 0 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local23982482_0001 java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:120) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) [snip] WARNING: No qualcluster.props found on classpath, will use command-line arguments only Num clusters: 0; maxDistance: 0.00 [Dunn Index] First: Infinity [Davies-Bouldin Index] First: NaN Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 535 ms (Minutes: 0.008916) cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh
[ https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878452#comment-13878452 ] Hudson commented on MAHOUT-1402: SUCCESS: Integrated in Mahout-Quality #2432 (See [https://builds.apache.org/job/Mahout-Quality/2432/]) MAHOUT-1402: Zero clusters using streaming k-means option in cluster-reuters.sh (smarthi: rev 1560287) * /mahout/trunk/CHANGELOG * /mahout/trunk/examples/bin/cluster-reuters.sh Zero clusters using streaming k-means option in cluster-reuters.sh -- Key: MAHOUT-1402 URL: https://issues.apache.org/jira/browse/MAHOUT-1402 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.8 Environment: AWS default Linux AMI Reporter: Andrew Musselman Assignee: Suneel Marthi Fix For: 0.9 Running cluster-reuters.sh in examples/bin results in this: [snip] INFO: Number of Centroids: 0 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local23982482_0001 java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:120) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) [snip] WARNING: No qualcluster.props found on classpath, will use command-line arguments only Num clusters: 0; maxDistance: 0.00 [Dunn Index] First: Infinity [Davies-Bouldin Index] First: NaN Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 535 ms (Minutes: 0.008916) cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878502#comment-13878502 ] Jai Kumar Singh commented on MAHOUT-1329: - Patch works fine with the trunk. Thanks. Mahout for hadoop 2 --- Key: MAHOUT-1329 URL: https://issues.apache.org/jira/browse/MAHOUT-1329 Project: Mahout Issue Type: Task Components: build Affects Versions: 0.9 Reporter: Sergey Svinarchuk Labels: patch Fix For: 1.0 Attachments: 1329-2.patch, 1329.patch Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: MAHOUT 0.9 Release - New URL
Fixed the issues that were reported this week and restored FP mining into the codebase. Here's the URL for the final release in staging:- https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ The artifacts have been signed with the following key: https://people.apache.org/keys/committer/smarthi.asc a) Verify that u can unpack the release (tar or zip) b) Verify u r able to compile the distro c) Run through the unit tests: mvn clean test d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script. Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
Re: MAHOUT 0.9 Release - New URL
I did a) b) c) and d) without noting any problem so far. +1 from me. --sebastian On 01/22/2014 11:55 PM, Suneel Marthi wrote: Fixed the issues that were reported this week and restored FP mining into the codebase. Here's the URL for the final release in staging:- https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ The artifacts have been signed with the following key: https://people.apache.org/keys/committer/smarthi.asc a) Verify that u can unpack the release (tar or zip) b) Verify u r able to compile the distro c) Run through the unit tests: mvn clean test d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script. Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
Re: MAHOUT 0.9 Release - New URL
Same here. I did a), b), c) and d) too and all tests pass. Here's my +1, if my vote counts. On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter s...@apache.org wrote: I did a) b) c) and d) without noting any problem so far. +1 from me. --sebastian On 01/22/2014 11:55 PM, Suneel Marthi wrote: Fixed the issues that were reported this week and restored FP mining into the codebase. Here's the URL for the final release in staging:- https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ The artifacts have been signed with the following key: https://people.apache.org/keys/committer/smarthi.asc a) Verify that u can unpack the release (tar or zip) b) Verify u r able to compile the distro c) Run through the unit tests: mvn clean test d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script. Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
Re: MAHOUT 0.9 Release - New URL
Likewise, a) through d) work on an Amazon AMI and Ubuntu 12.04. +1 On Wed, Jan 22, 2014 at 6:38 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Same here. I did a), b), c) and d) too and all tests pass. Here's my +1, if my vote counts. On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter s...@apache.org wrote: I did a) b) c) and d) without noting any problem so far. +1 from me. --sebastian On 01/22/2014 11:55 PM, Suneel Marthi wrote: Fixed the issues that were reported this week and restored FP mining into the codebase. Here's the URL for the final release in staging:- https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ The artifacts have been signed with the following key: https://people.apache.org/keys/committer/smarthi.asc a) Verify that u can unpack the release (tar or zip) b) Verify u r able to compile the distro c) Run through the unit tests: mvn clean test d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script. Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.