Re: JIRA issues 1248/1249

2014-01-20 Thread Sebastian Schelter

Hi Saikat,

I would suggest the following: compute the training error in the mapper 
that recomputes M in step 3 after the item vectors are recomputed. Find 
an efficient way to aggregate the errors from the mapper e.g. via 
Hadoop's counters and let the driver check for convergence.


It's ok to give the users an option to specify a threshold for the 
convergence of the error, but we should provide a reasonable default.


Does this answer your questions?

Best,
Sebastian

On 01/16/2014 04:46 PM, Saikat Kanjilal wrote:

Sebastien,Can I get some feedback on my plan otulined below, I'm going to get 
started with a design and put it on the JIRA ticket in the interim.
Thanks

From: sxk1...@hotmail.com
To: dev@mahout.apache.org
Subject: RE: JIRA issues 1248/1249
Date: Thu, 9 Jan 2014 21:47:06 -0800




Some more clarifications:
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
It seems like we can pretty much follow the strategy below:
Step 1 Initialize matrix M by assigning the average rating asthe first row, and 
small random numbers for the remaining entries.
Step 2 Fix M, Solve U by minimizing the objective function (the sum ofsquared 
errors);
Step 3 Fix U, solve M by minimizing the objective function similarly;
Step 4 Repeat Steps 2 and 3 until a stopping criterion is satisfied.

The stopping criterion in this case is where the objective function 
minimization has happened within the RMSE limits specified
Again the RMSE limit would be specified as a configuration parameter instead of 
the number of iterations.
Sebastien et al, would love to get some feedback on my approach.

From: sxk1...@hotmail.com
To: dev@mahout.apache.org
Subject: RE: JIRA issues 1248/1249
Date: Wed, 8 Jan 2014 20:17:04 -0800

I read through 1249 and had some initial questions before coming up with a 
plan, I was looking through the ParallelALSFactorizationJob.java and am 
assuming this is the right place to make all the changes, to this end:
1) I was thinking of introducing convergence training error as another 
parameter to be specified as a configuration parameter to replace the number of 
iterations2) For the chunk of code below:
for (int currentIteration = 0; currentIteration  numIterations; currentIteration++) {  /* broadcast M, read A 
row-wise, recompute U row-wise */  log.info(Recomputing U (iteration {}/{}), currentIteration, 
numIterations);  runSolver(pathToUserRatings(), pathToU(currentIteration), pathToM(currentIteration - 1), 
currentIteration, U,numItems);  /* broadcast U, read A' row-wise, recompute M row-wise */  
log.info(Recomputing M (iteration {}/{}), currentIteration, numIterations);  
runSolver(pathToItemRatings(), pathToM(currentIteration), pathToU(currentIteration), currentIteration, M,  
  numUsers);}

I am proposing we have a while loop similar to the following:
while (currentTrainingError=specifiedTrainingErrorForConvergence) { /* broadcast M, read A row-wise, recompute U 
row-wise */  log.info(Recomputing U (iteration {}/{}), currentIteration, numIterations);  
runSolver(pathToUserRatings(), pathToU(currentIteration), pathToM(currentIteration - 1), currentIteration, U,  
  numItems);  /* broadcast U, read A' row-wise, recompute M row-wise */  log.info(Recomputing M 
(iteration {}/{}), currentIteration, numIterations);  runSolver(pathToItemRatings(), pathToM(currentIteration), 
pathToU(currentIteration), currentIteration, M,numUsers);}
However I am wondering where or how I would compute the training error each 
time, would that happen inside runSolver or be an artifact of performing the 
solverComputation, pardon my ignorance on this, also I wanted to get deeper 
insight into ALS, is the following the best paper to read:
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
Specifically I am trying to understand where the training error comes into play 
within the SVD computation.
Really would appreciate some more insight as I explore and dig through the code.
Regards


Date: Tue, 7 Jan 2014 09:11:17 +0100
From: s...@apache.org
To: dev@mahout.apache.org
Subject: Re: JIRA issues 1248/1249

Hi Saikat,

I suggest to start with 1249, which is the easier task. The best way to
proceed is by discussing on the mailinglist. Have a look at the issue,
propose a solution here and wait for our feedback.

Best,
Sebastian

On 07.01.2014 04:27, Saikat Kanjilal wrote:

Sebastien et al,After months of not having bandwidth to help out with coding 
tasks I am finally ready to help with the implementation of the above JIRA 
issues, before I begin I wanted to make sure these improvements are still 
needed for ALS, I am targeting to finish these by the 1.0 release.   Also if 
these are relevant should I just present a design/plan of implementation?  I'd 
love some initial guidance and thoughts around these tasks, feel 

Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Suneel Marthi
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have 
been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna 
work and should have been removed for 0.9.

To PMC,

 - rollback the release, fix this issue (and other patches that were submitted 
in the last few days) and put out another release ?







On Monday, January 20, 2014 12:33 AM, Andrew Palumbo ap@outlook.com wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit 
of trouble getting the Hadoop natives to compile and therefore may have run 
into some problems because of the hadoop setup.  Ran into some problems in the 
example scripts.  Particularly with ./cluster-syntheticcontrol.sh -4,5.  I 
will run through the rest of the examples when im sure I've got hadoop setup 
right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: linux, version: 2.6.32-358.23.2.el6.x86_64, arch: amd64, family: 
unix
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 
2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh -1 [works]
    ./cluster-syntheticcontrol.sh -2 [works]
    ./cluster-syntheticcontrol.sh -3 [works]


    ./cluster-syntheticcontrol.sh -4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh -5 [exits, throws exception]

    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on 
classpath, will use command-line arguments only
    Unknown program 
'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh -1 [works]
    ./classify-20newsgroups.sh -2 [works]


    cluster-reuters.sh -1 [works]
    cluster-reuters.sh -2 [works]
    cluster-reuters.sh -3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh -4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line 
arguments only
    Num clusters: 0; maxDistance: 0.00
    [Dunn Index] First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






 Date: Thu, 16 Jan 2014 06:41:09 -0800
 From: suneel_mar...@yahoo.com
 Subject: MAHOUT 0.9 Release - New URL 
 To: 

Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Suneel Marthi
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, 
will be rerolling the release today (in the next few hrs) and putting out a new 
release candidate in staging.

Thanks for reporting this Andrew P. 





On Monday, January 20, 2014 12:34 AM, Andrew Palumbo ap@outlook.com wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit 
of trouble getting the Hadoop natives to compile and therefore may have run 
into some problems because of the hadoop setup.  Ran into some problems in the 
example scripts.  Particularly with ./cluster-syntheticcontrol.sh -4,5.  I 
will run through the rest of the examples when im sure I've got hadoop setup 
right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: linux, version: 2.6.32-358.23.2.el6.x86_64, arch: amd64, family: 
unix
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 
2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the
 example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh -1 [works]
    ./cluster-syntheticcontrol.sh -2 [works]
    ./cluster-syntheticcontrol.sh -3 [works]


    ./cluster-syntheticcontrol.sh -4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at
 java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh -5 [exits, throws exception]

    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on 
classpath, will use command-line arguments only
    Unknown program
 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh -1 [works]
    ./classify-20newsgroups.sh -2 [works]


    cluster-reuters.sh -1 [works]
    cluster-reuters.sh -2 [works]
    cluster-reuters.sh -3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh -4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line 
arguments only
    Num clusters: 0; maxDistance: 0.00
    [Dunn Index]
 First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






 Date: Thu, 16 Jan 2014 06:41:09 -0800
 From: suneel_mar...@yahoo.com
 Subject: MAHOUT 0.9 Release - New URL 
 To: u...@mahout.apache.org; dev@mahout.apache.org
 
 Third time's a Charm!!!
 
 
 Here's the new URL for Mahout 0.9 

[jira] [Commented] (MAHOUT-1395) Mahout CMS 404 Pages

2014-01-20 Thread Sotiris Salloumis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876549#comment-13876549
 ] 

Sotiris Salloumis commented on MAHOUT-1395:
---

Yes the patch for 1305 contains both fixes

 Mahout CMS 404 Pages
 

 Key: MAHOUT-1395
 URL: https://issues.apache.org/jira/browse/MAHOUT-1395
 Project: Mahout
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Sotiris Salloumis
Priority: Blocker
  Labels: Documentation
 Fix For: 0.9

 Attachments: MAHOUT-1395.patch


 Following pages currently are 404, please provide me the correct link/content 
 to update it in CMS
 1) Developes - Code Quality reports: Broken Link: 
 https://builds.apache.org/hudson/job/Mahout-Quality/clover/
 2) Classification - Design complimentary bayes : 
 http://mahout.apache.org/users/classification/complementary-naive-bayes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (MAHOUT-1395) Mahout CMS 404 Pages

2014-01-20 Thread Sotiris Salloumis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876549#comment-13876549
 ] 

Sotiris Salloumis edited comment on MAHOUT-1395 at 1/20/14 4:18 PM:


Yes please have in mind  the patch for 1305 contains also additonal fixes of 
another broken link


was (Author: sotiris.salloumis):
Yes the patch for 1305 contains both fixes

 Mahout CMS 404 Pages
 

 Key: MAHOUT-1395
 URL: https://issues.apache.org/jira/browse/MAHOUT-1395
 Project: Mahout
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Sotiris Salloumis
Priority: Blocker
  Labels: Documentation
 Fix For: 0.9

 Attachments: MAHOUT-1395.patch


 Following pages currently are 404, please provide me the correct link/content 
 to update it in CMS
 1) Developes - Code Quality reports: Broken Link: 
 https://builds.apache.org/hudson/job/Mahout-Quality/clover/
 2) Classification - Design complimentary bayes : 
 http://mahout.apache.org/users/classification/complementary-naive-bayes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAHOUT-1395) Mahout CMS 404 Pages

2014-01-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1395:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Has the patch been committed?

 Mahout CMS 404 Pages
 

 Key: MAHOUT-1395
 URL: https://issues.apache.org/jira/browse/MAHOUT-1395
 Project: Mahout
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Sotiris Salloumis
Priority: Blocker
  Labels: Documentation
 Fix For: 0.9

 Attachments: MAHOUT-1395.patch


 Following pages currently are 404, please provide me the correct link/content 
 to update it in CMS
 1) Developes - Code Quality reports: Broken Link: 
 https://builds.apache.org/hudson/job/Mahout-Quality/clover/
 2) Classification - Design complimentary bayes : 
 http://mahout.apache.org/users/classification/complementary-naive-bayes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Andrew Musselman
Trying out the build today


On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 This is an issue (trivial one though) that needs to be fixed for 0.9
 Release, will be rerolling the release today (in the next few hrs) and
 putting out a new release candidate in staging.

 Thanks for reporting this Andrew P.





 On Monday, January 20, 2014 12:34 AM, Andrew Palumbo ap@outlook.com
 wrote:

 I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a
 bit of trouble getting the Hadoop natives to compile and therefore may have
 run into some problems because of the hadoop setup.  Ran into some problems
 in the example scripts.  Particularly with ./cluster-syntheticcontrol.sh
 -4,5.  I will run through the rest of the examples when im sure I've got
 hadoop setup right.


 Apache Maven 3.1.2-SNAPSHOT
 Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
 Java home: /usr/java/jdk1.6.0_45/jre
 OS name: linux, version: 2.6.32-358.23.2.el6.x86_64, arch: amd64,
 family: unix
 $MAHOUT_LOCAL=true
 Hadoop 2.2.0


 a) Verify that u can unpack the release (tar or zip) ...passed (tar)
 [passed ]

 b) Verify u r able to compile the distro

 mvn compile- [passed with warnings]

 [WARNING]  Expected all dependencies to require Scala version: 2.9.3
 [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala
 version: 2.9.3
 [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version:
 2.9.2
 [WARNING] Multiple versions of scala libraries detected!

 c)  Run through the unit tests: mvn clean test
 mvn clean test [passed]

 d) Run the
  example scripts under $MAHOUT_HOME/examples/bin.
 Please run through all the different options in each script

 Running example scripts with $MAHOUT_LOCAL=true

 ./cluster-syntheticcontrol.sh -1 [works]
 ./cluster-syntheticcontrol.sh -2 [works]
 ./cluster-syntheticcontrol.sh -3 [works]


 ./cluster-syntheticcontrol.sh -4 [exits, throws exception]
 [...]
 WARNING: Unable to add class:
 org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
 java.lang.ClassNotFoundException:
 org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
 at
  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:171)
 at
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
 at
  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
 Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


 ./cluster-syntheticcontrol.sh -5 [exits, throws exception]

 WARNING: Unable to add class:
 org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
 java.lang.ClassNotFoundException:
 org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:171)
 at
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
 at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
 Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
 WARNING: No
 org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on
 classpath, will use command-line arguments only
 Unknown program
  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


 ./classify-20newsgroups.sh -1 [works]
 ./classify-20newsgroups.sh -2 [works]


 cluster-reuters.sh -1 [works]
 cluster-reuters.sh -2 [works]
 cluster-reuters.sh -3 [works]

 Same error as noted previosly in the thread:

 cluster-reuters.sh -4 [0 clusters]

 [...]

 WARNING: No qualcluster.props found on classpath, will use
 command-line arguments only
 Num clusters: 0; maxDistance: 0.00
 [Dunn Index]
  First: Infinity
 [Davies-Bouldin Index] First: NaN
 Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
 INFO: Program took 669 ms (Minutes: 0.01115)
 cluster,distance.mean,distance.sd
 ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






  Date: Thu, 16 Jan 2014 06:41:09 -0800
  From: 

Re: [jira] [Commented] (MAHOUT-1397) mahaout-math-scala/pom.xml not readable

2014-01-20 Thread Stevo Slavić
See http://scala-ide.org/docs/user/gettingstarted.html and especially
http://scala-ide.org/docs/user/gettingstarted.html#Import_a_Maven_project

Kind regards,
Stevo


On Mon, Jan 20, 2014 at 8:09 AM, Maruf Aytekin (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/MAHOUT-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876197#comment-13876197]

 Maruf Aytekin commented on MAHOUT-1397:
 ---

 Any idea how resolve this issue with eclipse without changing
 configuration of the plugin?

  mahaout-math-scala/pom.xml not readable
  ---
 
  Key: MAHOUT-1397
  URL: https://issues.apache.org/jira/browse/MAHOUT-1397
  Project: Mahout
   Issue Type: Bug
   Components: Math
 Affects Versions: 1.0
  Environment: Windows 7 Professional 64 bit
  Eclipse:
  Version: Kepler Service Release 1
  Build id: 20130919-0819
  maven 3.0.5
  Java: jdk1.6.0_45
 Reporter: Maruf Aytekin
 Assignee: Dmitriy Lyubimov
   Labels: maven
  Fix For: 1.0
 
 
  maven-scala-plugin in mahaout-math-scala/pom.xml gives an error.
  {code}
plugin
groupIdorg.scala-tools/groupId
artifactIdmaven-scala-plugin/artifactId
executions
execution
goals
 
 goalcompile/goal
 
 goaltestCompile/goal
/goals
/execution
/executions
configuration
 
 sourceDirsrc/main/scala/sourceDir
jvmArgs
jvmArg-Xms64m/jvmArg
jvmArg-Xmx1024m/jvmArg
/jvmArgs
/configuration
/plugin
  {code}
  Error displayed:
  {quote}
  Multiple annotations found at this line:
- Plugin execution not covered by lifecycle configuration:
 org.scala-tools:maven-scala-plugin:2.15.2:compile (execution: default,
 phase: compile)
- Plugin execution not covered by lifecycle configuration:
 org.scala-tools:maven-scala-plugin:2.15.2:testCompile (execution: default,
 phase: test-
 compile)
  {quote}



 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)



[jira] [Commented] (MAHOUT-1395) Mahout CMS 404 Pages

2014-01-20 Thread Sotiris Salloumis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876891#comment-13876891
 ] 

Sotiris Salloumis commented on MAHOUT-1395:
---

Yest I've upload it here 
https://issues.apache.org/jira/secure/attachment/12623674/MAHOUT-1304and1305.patch
 containing both fixes.

 Mahout CMS 404 Pages
 

 Key: MAHOUT-1395
 URL: https://issues.apache.org/jira/browse/MAHOUT-1395
 Project: Mahout
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Sotiris Salloumis
Priority: Blocker
  Labels: Documentation
 Fix For: 0.9

 Attachments: MAHOUT-1395.patch


 Following pages currently are 404, please provide me the correct link/content 
 to update it in CMS
 1) Developes - Code Quality reports: Broken Link: 
 https://builds.apache.org/hudson/job/Mahout-Quality/clover/
 2) Classification - Design complimentary bayes : 
 http://mahout.apache.org/users/classification/complementary-naive-bayes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (MAHOUT-1395) Mahout CMS 404 Pages

2014-01-20 Thread Sotiris Salloumis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876891#comment-13876891
 ] 

Sotiris Salloumis edited comment on MAHOUT-1395 at 1/20/14 9:55 PM:


Yes I've upload it here 
https://issues.apache.org/jira/secure/attachment/12623674/MAHOUT-1304and1305.patch
 containing both fixes.


was (Author: sotiris.salloumis):
Yest I've upload it here 
https://issues.apache.org/jira/secure/attachment/12623674/MAHOUT-1304and1305.patch
 containing both fixes.

 Mahout CMS 404 Pages
 

 Key: MAHOUT-1395
 URL: https://issues.apache.org/jira/browse/MAHOUT-1395
 Project: Mahout
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Sotiris Salloumis
Priority: Blocker
  Labels: Documentation
 Fix For: 0.9

 Attachments: MAHOUT-1395.patch


 Following pages currently are 404, please provide me the correct link/content 
 to update it in CMS
 1) Developes - Code Quality reports: Broken Link: 
 https://builds.apache.org/hudson/job/Mahout-Quality/clover/
 2) Classification - Design complimentary bayes : 
 http://mahout.apache.org/users/classification/complementary-naive-bayes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Andrew Musselman
Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
Linux AMI from tarball.

All tests pass.

*Output of examples:*
*asf-email-examples.sh, run on mahout.apache.org
http://mahout.apache.org:*
*recommendations:*
[ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
/user/ec2-user/asf-output/prefs/recommendations/part-r-0  | less
1
[21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
4
[14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
6
[5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
8   [12758:1.0,19409:1.0,2:1.0]
11
 
[25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
14
 
[29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
15
 
[15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
16
 
[23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
18
 
[29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
19  [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
20
 
[19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
[snip]

*clustering; kmeans:*
[snip]
Weight : [props - optional]:  Point:
1.0 : [distance-squared=1.0193102046188427]:
/commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
[1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
39789:0.110, 40743:0.190, 45775:0.086]
1.0 : [distance-squared=0.9823018320457279]:
/commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
[1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
1.0 : [distance-squared=0.9509142993214911]:
/commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
[648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048, 4419:0.076,
4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
41280:0.065, 41696:0.072, 41947:0.118, 43685:0.086, 44077:0.308,
44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
[snip]

*clustering; dirichlet:*
Get this complaint:
Running Dirichlet with K = 8
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
classpath, will use command-line arguments only
Unknown program 'dirichlet' chosen.

*clustering: minhash:*
Running Minhash
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:17:27 WARN driver.MahoutDriver: Unable to add class: minhash
14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
classpath, will use command-line arguments only
Unknown program 'minhash' chosen.

*classification; standard:*
===
Summary
---
Correctly Classified Instances