[jira] [Comment Edited] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh

2014-01-22 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878304#comment-13878304
 ] 

Suneel Marthi edited comment on MAHOUT-1402 at 1/22/14 8:57 AM:


The MR version of Streaming KMeans seems to be failing (the sequential mode 
passes), the reason being that the reducer is reading zero intermediate 
centroids; need to investigate as to what's going on. 


was (Author: smarthi):
The MR version of Streaming KMeans seems to be failing (the sequential mode 
passes), the reason being that the reducer is reading zero centroids from the 
mappers; need to investigate as to what's going on.

 Zero clusters using streaming k-means option in cluster-reuters.sh
 --

 Key: MAHOUT-1402
 URL: https://issues.apache.org/jira/browse/MAHOUT-1402
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.8
 Environment: AWS default Linux AMI
Reporter: Andrew Musselman
Assignee: Suneel Marthi
 Fix For: 0.9


 Running cluster-reuters.sh in examples/bin results in this:
 [snip]
 INFO: Number of Centroids: 0
 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
 WARNING: job_local23982482_0001
 java.lang.IllegalArgumentException: Must have nonzero number of training and 
 test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0]
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
 at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 [snip]
 WARNING: No qualcluster.props found on classpath, will use command-line 
 arguments only
 Num clusters: 0; maxDistance: 0.00
 [Dunn Index] First: Infinity
 [Davies-Bouldin Index] First: NaN
 Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
 INFO: Program took 535 ms (Minutes: 0.008916)
 cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh

2014-01-22 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878425#comment-13878425
 ] 

Suneel Marthi commented on MAHOUT-1402:
---

The issue happens when the ReduceStreamingKMeans (-rskm) flag is set to True. 
Setting this to False has resolved the issue. We are lacking adequate code 
coverage tests for Streaming KMeans when the -rskm flag is set. This is 
something that needs to be fixed post 0.9 Release.

 Zero clusters using streaming k-means option in cluster-reuters.sh
 --

 Key: MAHOUT-1402
 URL: https://issues.apache.org/jira/browse/MAHOUT-1402
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.8
 Environment: AWS default Linux AMI
Reporter: Andrew Musselman
Assignee: Suneel Marthi
 Fix For: 0.9


 Running cluster-reuters.sh in examples/bin results in this:
 [snip]
 INFO: Number of Centroids: 0
 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
 WARNING: job_local23982482_0001
 java.lang.IllegalArgumentException: Must have nonzero number of training and 
 test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0]
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
 at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 [snip]
 WARNING: No qualcluster.props found on classpath, will use command-line 
 arguments only
 Num clusters: 0; maxDistance: 0.00
 [Dunn Index] First: Infinity
 [Davies-Bouldin Index] First: NaN
 Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
 INFO: Program took 535 ms (Minutes: 0.008916)
 cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh

2014-01-22 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1402.
---

Resolution: Fixed

Removed the -rskm flag from the script to fix this, will have to investigate 
the implementation behavior when -rskm flag is specified for the next release. 

 Zero clusters using streaming k-means option in cluster-reuters.sh
 --

 Key: MAHOUT-1402
 URL: https://issues.apache.org/jira/browse/MAHOUT-1402
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.8
 Environment: AWS default Linux AMI
Reporter: Andrew Musselman
Assignee: Suneel Marthi
 Fix For: 0.9


 Running cluster-reuters.sh in examples/bin results in this:
 [snip]
 INFO: Number of Centroids: 0
 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
 WARNING: job_local23982482_0001
 java.lang.IllegalArgumentException: Must have nonzero number of training and 
 test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0]
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
 at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 [snip]
 WARNING: No qualcluster.props found on classpath, will use command-line 
 arguments only
 Num clusters: 0; maxDistance: 0.00
 [Dunn Index] First: Infinity
 [Davies-Bouldin Index] First: NaN
 Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
 INFO: Program took 535 ms (Minutes: 0.008916)
 cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh

2014-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878452#comment-13878452
 ] 

Hudson commented on MAHOUT-1402:


SUCCESS: Integrated in Mahout-Quality #2432 (See 
[https://builds.apache.org/job/Mahout-Quality/2432/])
MAHOUT-1402: Zero clusters using streaming k-means option in cluster-reuters.sh 
(smarthi: rev 1560287)
* /mahout/trunk/CHANGELOG
* /mahout/trunk/examples/bin/cluster-reuters.sh


 Zero clusters using streaming k-means option in cluster-reuters.sh
 --

 Key: MAHOUT-1402
 URL: https://issues.apache.org/jira/browse/MAHOUT-1402
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.8
 Environment: AWS default Linux AMI
Reporter: Andrew Musselman
Assignee: Suneel Marthi
 Fix For: 0.9


 Running cluster-reuters.sh in examples/bin results in this:
 [snip]
 INFO: Number of Centroids: 0
 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
 WARNING: job_local23982482_0001
 java.lang.IllegalArgumentException: Must have nonzero number of training and 
 test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0]
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
 at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 [snip]
 WARNING: No qualcluster.props found on classpath, will use command-line 
 arguments only
 Num clusters: 0; maxDistance: 0.00
 [Dunn Index] First: Infinity
 [Davies-Bouldin Index] First: NaN
 Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
 INFO: Program took 535 ms (Minutes: 0.008916)
 cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-01-22 Thread Jai Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878502#comment-13878502
 ] 

Jai Kumar Singh commented on MAHOUT-1329:
-

Patch works fine with the trunk. 
Thanks.

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: MAHOUT 0.9 Release - New URL

2014-01-22 Thread Suneel Marthi
Fixed the issues that were reported this week and restored FP mining into the 
codebase.

Here's the URL for the final release in staging:-
https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/

The artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc


a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c)  Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through 
all the different options in each script.

Committers and PMC, need a minimum of 3 '+1' votes for the release to be 
finalized. 

Re: MAHOUT 0.9 Release - New URL

2014-01-22 Thread Sebastian Schelter

I did a) b) c) and d) without noting any problem so far. +1 from me.

--sebastian


On 01/22/2014 11:55 PM, Suneel Marthi wrote:

Fixed the issues that were reported this week and restored FP mining into the 
codebase.

Here's the URL for the final release in staging:-
https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/

The artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc


a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c)  Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through 
all the different options in each script.

Committers and PMC, need a minimum of 3 '+1' votes for the release to be 
finalized.






Re: MAHOUT 0.9 Release - New URL

2014-01-22 Thread Suneel Marthi
Same here. I did a), b), c) and d) too and all tests pass. Here's my +1, if my 
vote counts.





On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter s...@apache.org 
wrote:
 
I did a) b) c) and d) without noting any problem so far. +1 from me.

--sebastian



On 01/22/2014 11:55 PM, Suneel Marthi wrote:
 Fixed the issues that were reported this week and restored FP mining into the 
 codebase.

 Here's the URL for the final release in staging:-
 https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/

 The artifacts have been signed with the following key:
 https://people.apache.org/keys/committer/smarthi.asc


 a) Verify that u can unpack the release (tar or zip)
 b) Verify u r able to compile the distro
 c)  Run through the unit tests: mvn clean test
 d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run 
 through all the different options in each script.

 Committers and PMC, need a minimum of 3 '+1' votes for the release to be 
 finalized.


Re: MAHOUT 0.9 Release - New URL

2014-01-22 Thread Andrew Musselman
Likewise, a) through d) work on an Amazon AMI and Ubuntu 12.04.

+1


On Wed, Jan 22, 2014 at 6:38 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Same here. I did a), b), c) and d) too and all tests pass. Here's my +1,
 if my vote counts.





 On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter s...@apache.org
 wrote:

 I did a) b) c) and d) without noting any problem so far. +1 from me.

 --sebastian



 On 01/22/2014 11:55 PM, Suneel Marthi wrote:
  Fixed the issues that were reported this week and restored FP mining
 into the codebase.
 
  Here's the URL for the final release in staging:-
 
 https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
 
  The artifacts have been signed with the following key:
  https://people.apache.org/keys/committer/smarthi.asc
 
 
  a) Verify that u can unpack the release (tar or zip)
  b) Verify u r able to compile the distro
  c)  Run through the unit tests: mvn clean test
  d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
 through all the different options in each script.
 
  Committers and PMC, need a minimum of 3 '+1' votes for the release to be
 finalized.