Re: Mahout 0.10.0 Bug bash

2015-04-05 Thread Pat Ferrel
Things like that question make me more suspicious. 

We really need to get a handle on the Hadoop version question.

I have run:

spark-itemsimilarity on Hadoop 1.2.1, 2.6.0 (fails), Andy ran it successfully 
on 2.2 and a user runs it on 2.4-MapR
2.6.0 seems to find the local file system with these lines:
  val conf = new Configuration()
  val fs = FileSystem.get(conf)
On the earlier versions of Hadoop, it finds the cluster, or pseudo cluster HDFS

I’ve run Any’s 20 new groups classifier test script on hadoop 1.2.1 with a 
classdef mismatch error, that probably means I built wrong. I’ll be testing 
that again Monday.

i’m building a 2.2.0 pseudo cluster and will run 20 news groups and 
spark-itemsimilairty Monday

I guess the big question is still 2.5 or 2.6 does anyone know why the two lines 
above would cause a problem in recent Hadoop versions? Does someone have a 
known good 2.6 cluster that they can try a couple tests on?


On Apr 5, 2015, at 9:52 AM, Andrew Musselman andrew.mussel...@gmail.com wrote:

I wonder if that HDFS/FS issue is the same problem I have with
cluster-reuters.sh.

On Sunday, April 5, 2015, Pat Ferrel p...@occamsmachete.com wrote:

 Very few of these are on the “official” ticket list here:
 
 https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC
 
 M-1674
 M-1665
 M-1648
 
 The next time this is published it would be great to get versions of
 Hadoop people are using and what has actually been run on a cluster or
 pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t
 run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on
 H2.6.0 but may not have an airtight configuration. If anyone has this
 config woking I can supply a very simple test.
 
 The failure happens when an HDFS path gets applied to the raw local
 filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is
 unset. The root of the error I’ve seen is in getting the FileSystem, which
 always returns the local one.
 
 
 M-1674 is new and was found on Friday. Dmitriy already has a private fix
 but can’t commit it so I think we need a workaround.
 
 On Apr 4, 2015, at 8:46 PM, Suneel Marthi suneel.mar...@gmail.com
 javascript:; wrote:
 
 Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
 April 6.  Please address ur assigned JIRAs on time.
 
 Anand Avati
 -
 
 M-1622: Multithreaded batch Item similarities output incorrect similarities
 M-1605: Make Visualizer test locale independent
 
 Andrew Palumbo
 --
 M-1559: Add documentation for Wikipedia example
 M-1648: Update CMS for Mahout 0.10.0
 
 Andrew Musselman
 -
 M-1462: Cleaning up Random Forests documentation on Mahout website
 M-1470: LDA Topic dump
 M-1655: Refactor module dependencies
 M-1658: KMeans fails when run on Hadoop clusters
 
 Frank Scholten
 -
 M-1625: lucene2seq: failure to convert a document that does not contain a
 field (the field is not required)
 M-1633: Failure to execute query when solr index contains documents with
 different fields
 M-1649: Lucene 5 upgrade
 
 Pat Ferrel
 -
 M-1507: Support input and output using user defined ID wherever possible
 M-1588: Multiple Input path support in Recommenders
 
 Stevo Slavic
 
 M-1277: Lose dependency on custom commons-cli
 M-1278: Improve inheritance of apache parent pom
 M-1562: Publish Scaladocs
 M-1585: Javadocs are not hosted By Mahout Quality
 M-1650: upgrade 3rd party jars
 
 Suneel Marthi
 -
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS set to true
 M-1512: Hadoop 2 compatibility
 M-1652: Java 7 update
 M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException
 
 Ted Dunning
 ---
 
 M-1672: TDigest update to 3.1 in OnlineSummarizers
 
 Unassigned
 --
 M-1551: Add document to describe how to use mlp with command line(Patch
 available)
 M-1637: RecommenderJob of ALS fails in the mapper because it uses the
 instance of other classs
 
 



Re: Mahout 0.10.0 Bug bash

2015-04-05 Thread Pat Ferrel
Very few of these are on the “official” ticket list here:
https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC

M-1674
M-1665
M-1648

The next time this is published it would be great to get versions of Hadoop 
people are using and what has actually been run on a cluster or pseudo cluster, 
under yarn etc. I’m increasingly suspicious that we don’t run uniformly on 
Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on H2.6.0 but may not 
have an airtight configuration. If anyone has this config woking I can supply a 
very simple test.

The failure happens when an HDFS path gets applied to the raw local filesystem, 
even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is unset. The root of 
the error I’ve seen is in getting the FileSystem, which always returns the 
local one.


M-1674 is new and was found on Friday. Dmitriy already has a private fix but 
can’t commit it so I think we need a workaround.

On Apr 4, 2015, at 8:46 PM, Suneel Marthi suneel.mar...@gmail.com wrote:

Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
April 6.  Please address ur assigned JIRAs on time.

Anand Avati
-

M-1622: Multithreaded batch Item similarities output incorrect similarities
M-1605: Make Visualizer test locale independent

Andrew Palumbo
--
M-1559: Add documentation for Wikipedia example
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies
M-1658: KMeans fails when run on Hadoop clusters

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1507: Support input and output using user defined ID wherever possible
M-1588: Multiple Input path support in Recommenders

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1652: Java 7 update
M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException

Ted Dunning
---

M-1672: TDigest update to 3.1 in OnlineSummarizers

Unassigned
--
M-1551: Add document to describe how to use mlp with command line(Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs



Re: Mahout 0.10.0 Bug bash

2015-04-05 Thread Andrew Musselman
I wonder if that HDFS/FS issue is the same problem I have with
cluster-reuters.sh.

On Sunday, April 5, 2015, Pat Ferrel p...@occamsmachete.com wrote:

 Very few of these are on the “official” ticket list here:

 https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC

 M-1674
 M-1665
 M-1648

 The next time this is published it would be great to get versions of
 Hadoop people are using and what has actually been run on a cluster or
 pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t
 run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on
 H2.6.0 but may not have an airtight configuration. If anyone has this
 config woking I can supply a very simple test.

 The failure happens when an HDFS path gets applied to the raw local
 filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is
 unset. The root of the error I’ve seen is in getting the FileSystem, which
 always returns the local one.


 M-1674 is new and was found on Friday. Dmitriy already has a private fix
 but can’t commit it so I think we need a workaround.

 On Apr 4, 2015, at 8:46 PM, Suneel Marthi suneel.mar...@gmail.com
 javascript:; wrote:

 Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
 April 6.  Please address ur assigned JIRAs on time.

 Anand Avati
 -

 M-1622: Multithreaded batch Item similarities output incorrect similarities
 M-1605: Make Visualizer test locale independent

 Andrew Palumbo
 --
 M-1559: Add documentation for Wikipedia example
 M-1648: Update CMS for Mahout 0.10.0

 Andrew Musselman
 -
 M-1462: Cleaning up Random Forests documentation on Mahout website
 M-1470: LDA Topic dump
 M-1655: Refactor module dependencies
 M-1658: KMeans fails when run on Hadoop clusters

 Frank Scholten
 -
 M-1625: lucene2seq: failure to convert a document that does not contain a
 field (the field is not required)
 M-1633: Failure to execute query when solr index contains documents with
 different fields
 M-1649: Lucene 5 upgrade

 Pat Ferrel
 -
 M-1507: Support input and output using user defined ID wherever possible
 M-1588: Multiple Input path support in Recommenders

 Stevo Slavic
 
 M-1277: Lose dependency on custom commons-cli
 M-1278: Improve inheritance of apache parent pom
 M-1562: Publish Scaladocs
 M-1585: Javadocs are not hosted By Mahout Quality
 M-1650: upgrade 3rd party jars

 Suneel Marthi
 -
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS set to true
 M-1512: Hadoop 2 compatibility
 M-1652: Java 7 update
 M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException

 Ted Dunning
 ---

 M-1672: TDigest update to 3.1 in OnlineSummarizers

 Unassigned
 --
 M-1551: Add document to describe how to use mlp with command line(Patch
 available)
 M-1637: RecommenderJob of ALS fails in the mapper because it uses the
 instance of other classs




Re: Mahout 0.10.0 Bug bash

2015-04-04 Thread Suneel Marthi
Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
April 6.  Please address ur assigned JIRAs on time.

Anand Avati
-

M-1622: Multithreaded batch Item similarities output incorrect similarities
M-1605: Make Visualizer test locale independent

Andrew Palumbo
--
M-1559: Add documentation for Wikipedia example
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies
M-1658: KMeans fails when run on Hadoop clusters

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1507: Support input and output using user defined ID wherever possible
M-1588: Multiple Input path support in Recommenders

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1652: Java 7 update
M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException

Ted Dunning
---

M-1672: TDigest update to 3.1 in OnlineSummarizers

Unassigned
--
M-1551: Add document to describe how to use mlp with command line(Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs


Re: Mahout 0.10.0 Bug bash

2015-04-01 Thread Andrew Musselman
Wednesday(*four days from code freeze Sunday*); some progress:

Andrew Palumbo
--
M-1493: Port Naive Bayes to Spark DSL(Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1617: 404 error on link in cluster-dumper tutorial page
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Gokhan Capan
--
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Pat Ferrel
-
M-1507: Support input and output using user defined ID wherever possible

Sebastian Schelter
--
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it(Patch available)

Shannon Quinn
---
M-1661: Remove Lanczos from the code base
M-1662: Potential Path bug in SequenceFileVaultIterator breaks
DisplaySpectralKMeans

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1586: Collections downloads must have hash signatures
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10

Unassigned
--
M-1551: Add document to describe how to use mlp with command line(Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs


Re: Mahout 0.10.0 Bug bash

2015-03-30 Thread Andrew Musselman
Monday(six days from code freeze Sunday)

Andrew Palumbo
--
M-1493: Port Naive Bayes to Spark DSL(Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Gokhan Capan
--
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Pat Ferrel
-
M-1507: Support input and output using user defined ID wherever possible
M-1589: mahout.cmd has duplicated content(Patch available)

Sebastian Schelter
--
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it(Patch available)

Shannon Quinn
---
M-1661: Remove Lanczos from the code base
M-1662: Potential Path bug in SequenceFileVaultIterator breaks
DisplaySpectralKMeans

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1602: Euclidean Distance Similarity Math
M-1650: upgrade 3rd party jars

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1586: Collections downloads must have hash signatures
M-1619: HighDFWordsPruner overwrites cache files
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Unassigned
--
M-1551: Add document to describe how to use mlp with command line(Patch
available)
M-1557: Add support for sparse training vectors in MLP(Patch available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException(Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
available)
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs


Re: Mahout 0.10.0 Bug bash

2015-03-29 Thread Andrew Palumbo
yeah there's something weird going on with  M-1609, but I closed it on 
Friday.


On 03/29/2015 12:36 PM, Andrew Musselman wrote:

Sunday's:

Andrew Palumbo
--
M-1477: Clean up website on Logistic Regression
M-1493: Port Naive Bayes to Spark DSL(Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1609: NullPointerException(This bug is not showing up aside from its
title)
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1563: cleanup Warnings during Build
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content(Patch available)

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1585: Javadocs not hosted by Mahout-Quality
M-1586: Collections downloads must have hash signatures
M-1619: HighDFWordsPruner overwrites cache files
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1602: Euclidean Distance Similarity Math
M-1650: upgrade 3rd party jars

Shannon Quinn
---
M-1538: Port spectral clustering to Mahout DSL
M-1539: Implement affinity matrix computation in Mahout DSL
M-1659: Remove deprecated Lanczos solver from spectral clustering in
mr-legacy

Sebastian Schelter
--
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it(Patch available)

Gokhan Capan
--
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Unassigned
--
M-1516: run classify-20newsgroups.sh failed cause by
/tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
available)
M-1551: Add document to describe how to use mlp with command line(Patch
available)
M-1557: Add support for sparse training vectors in MLP(Patch available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException(Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
available)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1634: ALS don't work when it adds new files in Distributed Cache
  (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other class





Re: Mahout 0.10.0 Bug bash

2015-03-29 Thread Suneel Marthi
A daily politely harsh' reminder of the April 5 code freeze date with the
daily bug bash would be helpful.

On Sun, Mar 29, 2015 at 12:36 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

 Sunday's:

 Andrew Palumbo
 --
 M-1477: Clean up website on Logistic Regression
 M-1493: Port Naive Bayes to Spark DSL(Patch available)
 M-1559: Documentation and cleanup for Naive Bayes Example
 M-1564: Naive Bayes classifier for new Text Documents
 M-1609: NullPointerException(This bug is not showing up aside from its
 title)
 M-1635: Getting an exception when I provide classification labels manually
 for Naive Bayes
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1648: Update CMS for Mahout 0.10.0

 Andrew Musselman
 -
 M-1462: Cleaning up Random Forests documentation on Mahout website
 M-1470: LDA Topic dump
 M-1522: Handle logging levels via log4j.xml
 M-1563: cleanup Warnings during Build
 M-1655: Refactor module dependencies

 Dmitriy Lyubimov
 --
 M-1646: Refactor out all legacy MR dependencies from scala code

 Frank Scholten
 -
 M-1625: lucene2seq: failure to convert a document that does not contain a
 field (the field is not required)
 M-1649: Lucene 5 upgrade

 Pat Ferrel
 -
 M-1589: mahout.cmd has duplicated content(Patch available)

 Suneel Marthi
 -
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS set to true
 M-1512: Hadoop 2 compatibility
 M-1585: Javadocs not hosted by Mahout-Quality
 M-1586: Collections downloads must have hash signatures
 M-1619: HighDFWordsPruner overwrites cache files
 M-1647: The release build is incomplete
 M-1652: Java 7 update
 M-1656: Change SNAPSHOT version from 1.0 to 0.10
 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

 Stevo Slavic
 
 M-1277: Lose dependency on custom commons-cli
 M-1278: Improve inheritance of apache parent pom
 M-1562: Publish Scaladocs
 M-1602: Euclidean Distance Similarity Math
 M-1650: upgrade 3rd party jars

 Shannon Quinn
 ---
 M-1538: Port spectral clustering to Mahout DSL
 M-1539: Implement affinity matrix computation in Mahout DSL
 M-1659: Remove deprecated Lanczos solver from spectral clustering in
 mr-legacy

 Sebastian Schelter
 --
 M-1584: Create a detailed example of how to index an arbitrary dataset and
 run LDA on it(Patch available)

 Gokhan Capan
 --
 M-1626: Support for required quasi-algebraic operations and starting with
 aggregating rows/blocks

 Unassigned
 --
 M-1516: run classify-20newsgroups.sh failed cause by
 /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
 available)
 M-1551: Add document to describe how to use mlp with command line(Patch
 available)
 M-1557: Add support for sparse training vectors in MLP(Patch available)
 M-1593: cluster-reuters.sh does not work complaining
 java.lang.IllegalStateException(Patch available)
 M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
 available)
 M-1633: Failure to execute query when solr index contains documents with
 different fields
 M-1634: ALS don't work when it adds new files in Distributed Cache
  (Patch available)
 M-1637: RecommenderJob of ALS fails in the mapper because it uses the
 instance of other class



Re: Mahout 0.10.0 Bug bash

2015-03-29 Thread Andrew Musselman
Yes, reminder we want to freeze/slush next Sunday.

If you won't be able to finish your bugs let's do some more triage and
split up work.

On Sunday, March 29, 2015, Suneel Marthi suneel.mar...@gmail.com wrote:

 A daily politely harsh' reminder of the April 5 code freeze date with the
 daily bug bash would be helpful.

 On Sun, Mar 29, 2015 at 12:36 PM, Andrew Musselman 
 andrew.mussel...@gmail.com javascript:; wrote:

  Sunday's:
 
  Andrew Palumbo
  --
  M-1477: Clean up website on Logistic Regression
  M-1493: Port Naive Bayes to Spark DSL(Patch available)
  M-1559: Documentation and cleanup for Naive Bayes Example
  M-1564: Naive Bayes classifier for new Text Documents
  M-1609: NullPointerException(This bug is not showing up aside from
 its
  title)
  M-1635: Getting an exception when I provide classification labels
 manually
  for Naive Bayes
  M-1638: H2O bindings fail at drmParallelizeWithRowLabels
  M-1648: Update CMS for Mahout 0.10.0
 
  Andrew Musselman
  -
  M-1462: Cleaning up Random Forests documentation on Mahout website
  M-1470: LDA Topic dump
  M-1522: Handle logging levels via log4j.xml
  M-1563: cleanup Warnings during Build
  M-1655: Refactor module dependencies
 
  Dmitriy Lyubimov
  --
  M-1646: Refactor out all legacy MR dependencies from scala code
 
  Frank Scholten
  -
  M-1625: lucene2seq: failure to convert a document that does not contain a
  field (the field is not required)
  M-1649: Lucene 5 upgrade
 
  Pat Ferrel
  -
  M-1589: mahout.cmd has duplicated content(Patch available)
 
  Suneel Marthi
  -
  M-1469: Streaming KMeans fails when executed in MR mode and
  REDUCE_STREAMING_KMEANS set to true
  M-1512: Hadoop 2 compatibility
  M-1585: Javadocs not hosted by Mahout-Quality
  M-1586: Collections downloads must have hash signatures
  M-1619: HighDFWordsPruner overwrites cache files
  M-1647: The release build is incomplete
  M-1652: Java 7 update
  M-1656: Change SNAPSHOT version from 1.0 to 0.10
  M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
 
  Stevo Slavic
  
  M-1277: Lose dependency on custom commons-cli
  M-1278: Improve inheritance of apache parent pom
  M-1562: Publish Scaladocs
  M-1602: Euclidean Distance Similarity Math
  M-1650: upgrade 3rd party jars
 
  Shannon Quinn
  ---
  M-1538: Port spectral clustering to Mahout DSL
  M-1539: Implement affinity matrix computation in Mahout DSL
  M-1659: Remove deprecated Lanczos solver from spectral clustering in
  mr-legacy
 
  Sebastian Schelter
  --
  M-1584: Create a detailed example of how to index an arbitrary dataset
 and
  run LDA on it(Patch available)
 
  Gokhan Capan
  --
  M-1626: Support for required quasi-algebraic operations and starting with
  aggregating rows/blocks
 
  Unassigned
  --
  M-1516: run classify-20newsgroups.sh failed cause by
  /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
  available)
  M-1551: Add document to describe how to use mlp with command line
 (Patch
  available)
  M-1557: Add support for sparse training vectors in MLP(Patch
 available)
  M-1593: cluster-reuters.sh does not work complaining
  java.lang.IllegalStateException(Patch available)
  M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
  available)
  M-1633: Failure to execute query when solr index contains documents with
  different fields
  M-1634: ALS don't work when it adds new files in Distributed Cache
   (Patch available)
  M-1637: RecommenderJob of ALS fails in the mapper because it uses the
  instance of other class
 



Re: Mahout 0.10.0 Bug bash

2015-03-29 Thread Andrew Palumbo

Sometimes it comes up and sometimes it doesn't, but it is resolved.

On 03/29/2015 01:57 PM, Suneel Marthi wrote:

yeah i noticed the weirdness with M-1609 too. Well lets keep that out of
the daily bug bash.

On Sun, Mar 29, 2015 at 1:55 PM, Andrew Palumbo ap@outlook.com wrote:


yeah there's something weird going on with  M-1609, but I closed it on
Friday.


On 03/29/2015 12:36 PM, Andrew Musselman wrote:


Sunday's:

Andrew Palumbo
--
M-1477: Clean up website on Logistic Regression
M-1493: Port Naive Bayes to Spark DSL(Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1609: NullPointerException(This bug is not showing up aside from its
title)
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1563: cleanup Warnings during Build
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content(Patch available)

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1585: Javadocs not hosted by Mahout-Quality
M-1586: Collections downloads must have hash signatures
M-1619: HighDFWordsPruner overwrites cache files
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1602: Euclidean Distance Similarity Math
M-1650: upgrade 3rd party jars

Shannon Quinn
---
M-1538: Port spectral clustering to Mahout DSL
M-1539: Implement affinity matrix computation in Mahout DSL
M-1659: Remove deprecated Lanczos solver from spectral clustering in
mr-legacy

Sebastian Schelter
--
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it(Patch available)

Gokhan Capan
--
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Unassigned
--
M-1516: run classify-20newsgroups.sh failed cause by
/tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
available)
M-1551: Add document to describe how to use mlp with command line
(Patch
available)
M-1557: Add support for sparse training vectors in MLP(Patch
available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException(Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
available)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1634: ALS don't work when it adds new files in Distributed Cache
   (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other class






Re: Mahout 0.10.0 Bug bash

2015-03-29 Thread Andrew Musselman
Sunday's:

Andrew Palumbo
--
M-1477: Clean up website on Logistic Regression
M-1493: Port Naive Bayes to Spark DSL(Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1609: NullPointerException(This bug is not showing up aside from its
title)
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1563: cleanup Warnings during Build
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content(Patch available)

Suneel Marthi
-
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1585: Javadocs not hosted by Mahout-Quality
M-1586: Collections downloads must have hash signatures
M-1619: HighDFWordsPruner overwrites cache files
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Stevo Slavic

M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1602: Euclidean Distance Similarity Math
M-1650: upgrade 3rd party jars

Shannon Quinn
---
M-1538: Port spectral clustering to Mahout DSL
M-1539: Implement affinity matrix computation in Mahout DSL
M-1659: Remove deprecated Lanczos solver from spectral clustering in
mr-legacy

Sebastian Schelter
--
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it(Patch available)

Gokhan Capan
--
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Unassigned
--
M-1516: run classify-20newsgroups.sh failed cause by
/tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
available)
M-1551: Add document to describe how to use mlp with command line(Patch
available)
M-1557: Add support for sparse training vectors in MLP(Patch available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException(Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
available)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other class


Re: Mahout 0.10.0 Bug bash

2015-03-28 Thread Andrew Musselman
Today's:

Andrew Palumbo
--
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1477: Clean up website on Logistic Regression
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1493: Port Naive Bayes to Spark DSL(Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-
M-1655: Refactor module dependencies
M-1522: Handle logging levels via log4j.xml
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump
M-1462: Cleaning up Random Forests documentation on Mahout website

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1649: Lucene 5 upgrade
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content(Patch available)
M-1618: co-occurence recommender example

Suneel Marthi
-
M-1586: Collections downloads must have hash signatures
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS
set to true
M-1443: Update How to Release page(Tagged 0.10.1)
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
M-1619: HighDFWordsPruner overwrites cache files

Stevo Slavic

M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1277: Lose dependency on custom commons-cli

Shannon Quinn
---
M-1538: Port spectral clustering to Mahout DSL
M-1593: Implement affinity matrix computation in Mahout DSL
M-1540: Reuters Example spectral clustering Also online docs for Spectral
clustering
M-1659: Remove deprecated Lanczos solver from spectral clustering in
mr-legacy

Ted Dunning
---
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient

Sebastian Schelter
--
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it(Patch available)

Gokhan Capan
--
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Unassigned
--
M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException(Patch available)
M-1557: Add support for sparse training vectors in MLP(Patch available)
M-1516: run classify-20newsgroups.sh failed cause by
/tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
available)
M-1643: CLI arguments are not being processed in spark-shell
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other class
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1551: Add document to describe how to use mlp with command line(Patch
available)

On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com
wrote:

 Ok here's the bug bash as of today

 Andrew Palumbo
 --
 M-1648: Update CMS for Mahout 0.10.0
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1564: Naive Bayes classifier for new Text Documents
 M-1635: Exception when providing classification Labels
 M-1493: Port Naive Bayes to Spark DSL
 M-1559: Documentation and cleanup for Naive Bayes Example
 M-1609: NullPointerException
 M-1607: Spark-shell DAG scheduler

 Andrew Musselman
 -
 M-1655: Refactor module dependencies
 M-1563: cleanup Warnings during Build
 M-1470: LDA Topic dump

 Dmitriy Lyubimov
 --
 M-1646: Refactor out all legacy MR dependencies from scala code

 Frank Scholten
 -
 M-1649: Lucene 5 upgrade

 Pat Ferrel
 -
 M-1589: mahout.cmd has duplicated content
 M-1618: co-occurence recommender example

 Suneel Marthi
 -
 M-1586: Collections downloads must have hash signatures
 M-1647: Release build
 M-1652: Java 7 update
 M-1512: Hadoop 2 compatibility
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS set to true
 M-1443: Update How to Release page
 M-1585: Javadocs not hosted by Mahout-Quality
 M-1612: NPE during JSON outputformatter for clusterdump

 Stevo Slavic
 

Re: Mahout 0.10.0 Bug bash

2015-03-28 Thread Suneel Marthi
Seems like we are stretched pretty thin given the work load, not to mention
that Mahout work is completely orthogonal to our paychecks.

Ted, Grant, Shannon - possible you guys could take some of the load??

On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

 Today's:

 Andrew Palumbo
 --
 M-1648: Update CMS for Mahout 0.10.0
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1477: Clean up website on Logistic Regression
 M-1564: Naive Bayes classifier for new Text Documents
 M-1635: Getting an exception when I provide classification labels manually
 for Naive Bayes
 M-1493: Port Naive Bayes to Spark DSL(Patch available)
 M-1559: Documentation and cleanup for Naive Bayes Example
 M-1609: NullPointerException
 M-1607: Spark-shell DAG scheduler

 Andrew Musselman
 -
 M-1655: Refactor module dependencies
 M-1522: Handle logging levels via log4j.xml
 M-1563: cleanup Warnings during Build
 M-1470: LDA Topic dump
 M-1462: Cleaning up Random Forests documentation on Mahout website

 Dmitriy Lyubimov
 --
 M-1646: Refactor out all legacy MR dependencies from scala code

 Frank Scholten
 -
 M-1649: Lucene 5 upgrade
 M-1625: lucene2seq: failure to convert a document that does not contain a
 field (the field is not required)

 Pat Ferrel
 -
 M-1589: mahout.cmd has duplicated content(Patch available)
 M-1618: co-occurence recommender example

 Suneel Marthi
 -
 M-1586: Collections downloads must have hash signatures
 M-1647: The release build is incomplete
 M-1652: Java 7 update
 M-1512: Hadoop 2 compatibility
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS
 set to true
 M-1443: Update How to Release page(Tagged 0.10.1)
 M-1585: Javadocs not hosted by Mahout-Quality
 M-1612: NPE during JSON outputformatter for clusterdump
 M-1656: Change SNAPSHOT version from 1.0 to 0.10
 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
 M-1619: HighDFWordsPruner overwrites cache files

 Stevo Slavic
 
 M-1650: upgrade 3rd party jars
 M-1602: Euclidean Distance Similarity Math
 M-1278: Improve inheritance of apache parent pom
 M-1562: Publish Scaladocs
 M-1277: Lose dependency on custom commons-cli

 Shannon Quinn
 ---
 M-1538: Port spectral clustering to Mahout DSL
 M-1593: Implement affinity matrix computation in Mahout DSL
 M-1540: Reuters Example spectral clustering Also online docs for Spectral
 clustering
 M-1659: Remove deprecated Lanczos solver from spectral clustering in
 mr-legacy

 Ted Dunning
 ---
 M-1636: Class dependencies for Spark module are put in job.jar, which is
 inefficient

 Sebastian Schelter
 --
 M-1584: Create a detailed example of how to index an arbitrary dataset and
 run LDA on it(Patch available)

 Gokhan Capan
 --
 M-1626: Support for required quasi-algebraic operations and starting with
 aggregating rows/blocks

 Unassigned
 --
 M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
 available)
 M-1593: cluster-reuters.sh does not work complaining
 java.lang.IllegalStateException(Patch available)
 M-1557: Add support for sparse training vectors in MLP(Patch available)
 M-1516: run classify-20newsgroups.sh failed cause by
 /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
 available)
 M-1643: CLI arguments are not being processed in spark-shell
 M-1637: RecommenderJob of ALS fails in the mapper because it uses the
 instance of other class
 M-1634: ALS don't work when it adds new files in Distributed Cache
  (Patch available)
 M-1633: Failure to execute query when solr index contains documents with
 different fields
 M-1551: Add document to describe how to use mlp with command line(Patch
 available)

 On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com
 wrote:

  Ok here's the bug bash as of today
 
  Andrew Palumbo
  --
  M-1648: Update CMS for Mahout 0.10.0
  M-1638: H2O bindings fail at drmParallelizeWithRowLabels
  M-1564: Naive Bayes classifier for new Text Documents
  M-1635: Exception when providing classification Labels
  M-1493: Port Naive Bayes to Spark DSL
  M-1559: Documentation and cleanup for Naive Bayes Example
  M-1609: NullPointerException
  M-1607: Spark-shell DAG scheduler
 
  Andrew Musselman
  -
  M-1655: Refactor module dependencies
  M-1563: cleanup Warnings during Build
  M-1470: LDA Topic dump
 
  Dmitriy Lyubimov
  --
  M-1646: Refactor out all legacy MR dependencies from scala code
 
  Frank Scholten
  -
  M-1649: Lucene 5 upgrade
 
  Pat Ferrel
  -
  M-1589: mahout.cmd has duplicated content
  M-1618: co-occurence recommender example
 
  Suneel 

Re: Mahout 0.10.0 Bug bash

2015-03-28 Thread Shannon Quinn
Wait, I thought all DSL work on spectral clustering was waiting until 0.10.1?

iPhone'd

 On Mar 28, 2015, at 13:49, Suneel Marthi suneel.mar...@gmail.com wrote:
 
 Seems like we are stretched pretty thin given the work load, not to mention
 that Mahout work is completely orthogonal to our paychecks.
 
 Ted, Grant, Shannon - possible you guys could take some of the load??
 
 On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:
 
 Today's:
 
 Andrew Palumbo
 --
 M-1648: Update CMS for Mahout 0.10.0
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1477: Clean up website on Logistic Regression
 M-1564: Naive Bayes classifier for new Text Documents
 M-1635: Getting an exception when I provide classification labels manually
 for Naive Bayes
 M-1493: Port Naive Bayes to Spark DSL(Patch available)
 M-1559: Documentation and cleanup for Naive Bayes Example
 M-1609: NullPointerException
 M-1607: Spark-shell DAG scheduler
 
 Andrew Musselman
 -
 M-1655: Refactor module dependencies
 M-1522: Handle logging levels via log4j.xml
 M-1563: cleanup Warnings during Build
 M-1470: LDA Topic dump
 M-1462: Cleaning up Random Forests documentation on Mahout website
 
 Dmitriy Lyubimov
 --
 M-1646: Refactor out all legacy MR dependencies from scala code
 
 Frank Scholten
 -
 M-1649: Lucene 5 upgrade
 M-1625: lucene2seq: failure to convert a document that does not contain a
 field (the field is not required)
 
 Pat Ferrel
 -
 M-1589: mahout.cmd has duplicated content(Patch available)
 M-1618: co-occurence recommender example
 
 Suneel Marthi
 -
 M-1586: Collections downloads must have hash signatures
 M-1647: The release build is incomplete
 M-1652: Java 7 update
 M-1512: Hadoop 2 compatibility
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS
 set to true
 M-1443: Update How to Release page(Tagged 0.10.1)
 M-1585: Javadocs not hosted by Mahout-Quality
 M-1612: NPE during JSON outputformatter for clusterdump
 M-1656: Change SNAPSHOT version from 1.0 to 0.10
 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
 M-1619: HighDFWordsPruner overwrites cache files
 
 Stevo Slavic
 
 M-1650: upgrade 3rd party jars
 M-1602: Euclidean Distance Similarity Math
 M-1278: Improve inheritance of apache parent pom
 M-1562: Publish Scaladocs
 M-1277: Lose dependency on custom commons-cli
 
 Shannon Quinn
 ---
 M-1538: Port spectral clustering to Mahout DSL
 M-1593: Implement affinity matrix computation in Mahout DSL
 M-1540: Reuters Example spectral clustering Also online docs for Spectral
 clustering
 M-1659: Remove deprecated Lanczos solver from spectral clustering in
 mr-legacy
 
 Ted Dunning
 ---
 M-1636: Class dependencies for Spark module are put in job.jar, which is
 inefficient
 
 Sebastian Schelter
 --
 M-1584: Create a detailed example of how to index an arbitrary dataset and
 run LDA on it(Patch available)
 
 Gokhan Capan
 --
 M-1626: Support for required quasi-algebraic operations and starting with
 aggregating rows/blocks
 
 Unassigned
 --
 M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
 available)
 M-1593: cluster-reuters.sh does not work complaining
 java.lang.IllegalStateException(Patch available)
 M-1557: Add support for sparse training vectors in MLP(Patch available)
 M-1516: run classify-20newsgroups.sh failed cause by
 /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
 available)
 M-1643: CLI arguments are not being processed in spark-shell
 M-1637: RecommenderJob of ALS fails in the mapper because it uses the
 instance of other class
 M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
 M-1633: Failure to execute query when solr index contains documents with
 different fields
 M-1551: Add document to describe how to use mlp with command line(Patch
 available)
 
 On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com
 wrote:
 
 Ok here's the bug bash as of today
 
 Andrew Palumbo
 --
 M-1648: Update CMS for Mahout 0.10.0
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1564: Naive Bayes classifier for new Text Documents
 M-1635: Exception when providing classification Labels
 M-1493: Port Naive Bayes to Spark DSL
 M-1559: Documentation and cleanup for Naive Bayes Example
 M-1609: NullPointerException
 M-1607: Spark-shell DAG scheduler
 
 Andrew Musselman
 -
 M-1655: Refactor module dependencies
 M-1563: cleanup Warnings during Build
 M-1470: LDA Topic dump
 
 Dmitriy Lyubimov
 --
 M-1646: Refactor out all legacy MR dependencies from scala code
 
 Frank Scholten
 -

Re: Mahout 0.10.0 Bug bash

2015-03-28 Thread Suneel Marthi
that's right, feel free to edit ur Jiras to reflect that.

On Sat, Mar 28, 2015 at 2:22 PM, Shannon Quinn squ...@gatech.edu wrote:

 Wait, I thought all DSL work on spectral clustering was waiting until
 0.10.1?

 iPhone'd

  On Mar 28, 2015, at 13:49, Suneel Marthi suneel.mar...@gmail.com
 wrote:
 
  Seems like we are stretched pretty thin given the work load, not to
 mention
  that Mahout work is completely orthogonal to our paychecks.
 
  Ted, Grant, Shannon - possible you guys could take some of the load??
 
  On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman 
  andrew.mussel...@gmail.com wrote:
 
  Today's:
 
  Andrew Palumbo
  --
  M-1648: Update CMS for Mahout 0.10.0
  M-1638: H2O bindings fail at drmParallelizeWithRowLabels
  M-1477: Clean up website on Logistic Regression
  M-1564: Naive Bayes classifier for new Text Documents
  M-1635: Getting an exception when I provide classification labels
 manually
  for Naive Bayes
  M-1493: Port Naive Bayes to Spark DSL(Patch available)
  M-1559: Documentation and cleanup for Naive Bayes Example
  M-1609: NullPointerException
  M-1607: Spark-shell DAG scheduler
 
  Andrew Musselman
  -
  M-1655: Refactor module dependencies
  M-1522: Handle logging levels via log4j.xml
  M-1563: cleanup Warnings during Build
  M-1470: LDA Topic dump
  M-1462: Cleaning up Random Forests documentation on Mahout website
 
  Dmitriy Lyubimov
  --
  M-1646: Refactor out all legacy MR dependencies from scala code
 
  Frank Scholten
  -
  M-1649: Lucene 5 upgrade
  M-1625: lucene2seq: failure to convert a document that does not contain
 a
  field (the field is not required)
 
  Pat Ferrel
  -
  M-1589: mahout.cmd has duplicated content(Patch available)
  M-1618: co-occurence recommender example
 
  Suneel Marthi
  -
  M-1586: Collections downloads must have hash signatures
  M-1647: The release build is incomplete
  M-1652: Java 7 update
  M-1512: Hadoop 2 compatibility
  M-1469: Streaming KMeans fails when executed in MR mode and
  REDUCE_STREAMING_KMEANS
  set to true
  M-1443: Update How to Release page(Tagged 0.10.1)
  M-1585: Javadocs not hosted by Mahout-Quality
  M-1612: NPE during JSON outputformatter for clusterdump
  M-1656: Change SNAPSHOT version from 1.0 to 0.10
  M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
  M-1619: HighDFWordsPruner overwrites cache files
 
  Stevo Slavic
  
  M-1650: upgrade 3rd party jars
  M-1602: Euclidean Distance Similarity Math
  M-1278: Improve inheritance of apache parent pom
  M-1562: Publish Scaladocs
  M-1277: Lose dependency on custom commons-cli
 
  Shannon Quinn
  ---
  M-1538: Port spectral clustering to Mahout DSL
  M-1593: Implement affinity matrix computation in Mahout DSL
  M-1540: Reuters Example spectral clustering Also online docs for
 Spectral
  clustering
  M-1659: Remove deprecated Lanczos solver from spectral clustering in
  mr-legacy
 
  Ted Dunning
  ---
  M-1636: Class dependencies for Spark module are put in job.jar, which is
  inefficient
 
  Sebastian Schelter
  --
  M-1584: Create a detailed example of how to index an arbitrary dataset
 and
  run LDA on it(Patch available)
 
  Gokhan Capan
  --
  M-1626: Support for required quasi-algebraic operations and starting
 with
  aggregating rows/blocks
 
  Unassigned
  --
  M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
  available)
  M-1593: cluster-reuters.sh does not work complaining
  java.lang.IllegalStateException(Patch available)
  M-1557: Add support for sparse training vectors in MLP(Patch
 available)
  M-1516: run classify-20newsgroups.sh failed cause by
  /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
  available)
  M-1643: CLI arguments are not being processed in spark-shell
  M-1637: RecommenderJob of ALS fails in the mapper because it uses the
  instance of other class
  M-1634: ALS don't work when it adds new files in Distributed Cache
  (Patch available)
  M-1633: Failure to execute query when solr index contains documents with
  different fields
  M-1551: Add document to describe how to use mlp with command line
 (Patch
  available)
 
  On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com
 
  wrote:
 
  Ok here's the bug bash as of today
 
  Andrew Palumbo
  --
  M-1648: Update CMS for Mahout 0.10.0
  M-1638: H2O bindings fail at drmParallelizeWithRowLabels
  M-1564: Naive Bayes classifier for new Text Documents
  M-1635: Exception when providing classification Labels
  M-1493: Port Naive Bayes to Spark DSL
  M-1559: Documentation and cleanup for Naive Bayes Example
  M-1609: NullPointerException
  M-1607: Spark-shell DAG scheduler
 
  Andrew Musselman
  

Re: Mahout 0.10.0 Bug bash

2015-03-28 Thread Shannon Quinn
Ah no worries, just got a bit panicked when I saw that. 

Summer will be better for me but for now these tickets have about maxed me out; 
3 months into the new tenure-track shtick is grueling. 

iPhone'd

 On Mar 28, 2015, at 14:27, Andrew Musselman andrew.mussel...@gmail.com 
 wrote:
 
 Okay, go ahead and move it; I was just moving things from 1.0 to 0.10.0
 almost indiscriminately.
 
 On Sat, Mar 28, 2015 at 11:22 AM, Shannon Quinn squ...@gatech.edu wrote:
 
 Wait, I thought all DSL work on spectral clustering was waiting until
 0.10.1?
 
 iPhone'd
 
 On Mar 28, 2015, at 13:49, Suneel Marthi suneel.mar...@gmail.com
 wrote:
 
 Seems like we are stretched pretty thin given the work load, not to
 mention
 that Mahout work is completely orthogonal to our paychecks.
 
 Ted, Grant, Shannon - possible you guys could take some of the load??
 
 On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:
 
 Today's:
 
 Andrew Palumbo
 --
 M-1648: Update CMS for Mahout 0.10.0
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1477: Clean up website on Logistic Regression
 M-1564: Naive Bayes classifier for new Text Documents
 M-1635: Getting an exception when I provide classification labels
 manually
 for Naive Bayes
 M-1493: Port Naive Bayes to Spark DSL(Patch available)
 M-1559: Documentation and cleanup for Naive Bayes Example
 M-1609: NullPointerException
 M-1607: Spark-shell DAG scheduler
 
 Andrew Musselman
 -
 M-1655: Refactor module dependencies
 M-1522: Handle logging levels via log4j.xml
 M-1563: cleanup Warnings during Build
 M-1470: LDA Topic dump
 M-1462: Cleaning up Random Forests documentation on Mahout website
 
 Dmitriy Lyubimov
 --
 M-1646: Refactor out all legacy MR dependencies from scala code
 
 Frank Scholten
 -
 M-1649: Lucene 5 upgrade
 M-1625: lucene2seq: failure to convert a document that does not contain
 a
 field (the field is not required)
 
 Pat Ferrel
 -
 M-1589: mahout.cmd has duplicated content(Patch available)
 M-1618: co-occurence recommender example
 
 Suneel Marthi
 -
 M-1586: Collections downloads must have hash signatures
 M-1647: The release build is incomplete
 M-1652: Java 7 update
 M-1512: Hadoop 2 compatibility
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS
 set to true
 M-1443: Update How to Release page(Tagged 0.10.1)
 M-1585: Javadocs not hosted by Mahout-Quality
 M-1612: NPE during JSON outputformatter for clusterdump
 M-1656: Change SNAPSHOT version from 1.0 to 0.10
 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
 M-1619: HighDFWordsPruner overwrites cache files
 
 Stevo Slavic
 
 M-1650: upgrade 3rd party jars
 M-1602: Euclidean Distance Similarity Math
 M-1278: Improve inheritance of apache parent pom
 M-1562: Publish Scaladocs
 M-1277: Lose dependency on custom commons-cli
 
 Shannon Quinn
 ---
 M-1538: Port spectral clustering to Mahout DSL
 M-1593: Implement affinity matrix computation in Mahout DSL
 M-1540: Reuters Example spectral clustering Also online docs for
 Spectral
 clustering
 M-1659: Remove deprecated Lanczos solver from spectral clustering in
 mr-legacy
 
 Ted Dunning
 ---
 M-1636: Class dependencies for Spark module are put in job.jar, which is
 inefficient
 
 Sebastian Schelter
 --
 M-1584: Create a detailed example of how to index an arbitrary dataset
 and
 run LDA on it(Patch available)
 
 Gokhan Capan
 --
 M-1626: Support for required quasi-algebraic operations and starting
 with
 aggregating rows/blocks
 
 Unassigned
 --
 M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch
 available)
 M-1593: cluster-reuters.sh does not work complaining
 java.lang.IllegalStateException(Patch available)
 M-1557: Add support for sparse training vectors in MLP(Patch
 available)
 M-1516: run classify-20newsgroups.sh failed cause by
 /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch
 available)
 M-1643: CLI arguments are not being processed in spark-shell
 M-1637: RecommenderJob of ALS fails in the mapper because it uses the
 instance of other class
 M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
 M-1633: Failure to execute query when solr index contains documents with
 different fields
 M-1551: Add document to describe how to use mlp with command line
 (Patch
 available)
 
 On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com
 
 wrote:
 
 Ok here's the bug bash as of today
 
 Andrew Palumbo
 --
 M-1648: Update CMS for Mahout 0.10.0
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1564: Naive Bayes classifier for new Text Documents
 M-1635: Exception when providing 

Re: Mahout 0.10.0 Bug bash

2015-03-27 Thread Shannon Quinn

Yes--removing the Lanczos solver from spectral clustering.

On 3/27/15 10:29 AM, Suneel Marthi wrote:

and this is for 0.10.0 ???

On Fri, Mar 27, 2015 at 10:27 AM, Shannon Quinn squ...@gatech.edu wrote:


Created M-1659 and assigned it to myself to reflect current work.

Shannon


On 3/26/15 10:07 PM, Suneel Marthi wrote:


Ok here's the bug bash as of today

Andrew Palumbo
--
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Exception when providing classification Labels
M-1493: Port Naive Bayes to Spark DSL
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-
M-1655: Refactor module dependencies
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content
M-1618: co-occurence recommender example

Suneel Marthi
-
M-1586: Collections downloads must have hash signatures
M-1647: Release build
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1443: Update How to Release page
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump

Stevo Slavic

M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom

Shannon Quinn
---
M-1540: Reuters Example spectral clustering
Also online docs for Spectral clustering

Ted Dunning
---
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient






Re: Mahout 0.10.0 Bug bash

2015-03-27 Thread Shannon Quinn

Created M-1659 and assigned it to myself to reflect current work.

Shannon

On 3/26/15 10:07 PM, Suneel Marthi wrote:

Ok here's the bug bash as of today

Andrew Palumbo
--
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Exception when providing classification Labels
M-1493: Port Naive Bayes to Spark DSL
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-
M-1655: Refactor module dependencies
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content
M-1618: co-occurence recommender example

Suneel Marthi
-
M-1586: Collections downloads must have hash signatures
M-1647: Release build
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1443: Update How to Release page
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump

Stevo Slavic

M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom

Shannon Quinn
---
M-1540: Reuters Example spectral clustering
Also online docs for Spectral clustering

Ted Dunning
---
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient





Re: Mahout 0.10.0 Bug bash

2015-03-27 Thread Pat Ferrel
Not sure what to do about the Windows mahout.cmd script. I don’t even own a 
Window VM so there is no way I can look into this except for asking for help, 
which I have done. What happens if no one volunteers? Is this a blocker? M-1589

I took M-1636, should be resolved. Need a final test on a cluster, which I am 
trying today.

Aren’t M-1655 and M-1646 the same? Dmitriy is not committing code so any work 
must be reassigned if it needs to be done.


On Mar 26, 2015, at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote:

Ok here's the bug bash as of today

Andrew Palumbo
--
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Exception when providing classification Labels
M-1493: Port Naive Bayes to Spark DSL
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-
M-1655: Refactor module dependencies
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content
M-1618: co-occurence recommender example

Suneel Marthi
-
M-1586: Collections downloads must have hash signatures
M-1647: Release build
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1443: Update How to Release page
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump

Stevo Slavic

M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom

Shannon Quinn
---
M-1540: Reuters Example spectral clustering
Also online docs for Spectral clustering

Ted Dunning
---
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient



Re: Mahout 0.10.0 Bug bash

2015-03-27 Thread Suneel Marthi
Its not a blocker, I would just close it and move on until the next Windows
guy creates a new Jira :)

On Fri, Mar 27, 2015 at 11:29 AM, Pat Ferrel p...@occamsmachete.com wrote:

 Not sure what to do about the Windows mahout.cmd script. I don’t even own
 a Window VM so there is no way I can look into this except for asking for
 help, which I have done. What happens if no one volunteers? Is this a
 blocker? M-1589

 I took M-1636, should be resolved. Need a final test on a cluster, which I
 am trying today.

 Aren’t M-1655 and M-1646 the same? Dmitriy is not committing code so any
 work must be reassigned if it needs to be done.


 On Mar 26, 2015, at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com
 wrote:

 Ok here's the bug bash as of today

 Andrew Palumbo
 --
 M-1648: Update CMS for Mahout 0.10.0
 M-1638: H2O bindings fail at drmParallelizeWithRowLabels
 M-1564: Naive Bayes classifier for new Text Documents
 M-1635: Exception when providing classification Labels
 M-1493: Port Naive Bayes to Spark DSL
 M-1559: Documentation and cleanup for Naive Bayes Example
 M-1609: NullPointerException
 M-1607: Spark-shell DAG scheduler

 Andrew Musselman
 -
 M-1655: Refactor module dependencies
 M-1563: cleanup Warnings during Build
 M-1470: LDA Topic dump

 Dmitriy Lyubimov
 --
 M-1646: Refactor out all legacy MR dependencies from scala code

 Frank Scholten
 -
 M-1649: Lucene 5 upgrade

 Pat Ferrel
 -
 M-1589: mahout.cmd has duplicated content
 M-1618: co-occurence recommender example

 Suneel Marthi
 -
 M-1586: Collections downloads must have hash signatures
 M-1647: Release build
 M-1652: Java 7 update
 M-1512: Hadoop 2 compatibility
 M-1469: Streaming KMeans fails when executed in MR mode and
 REDUCE_STREAMING_KMEANS set to true
 M-1443: Update How to Release page
 M-1585: Javadocs not hosted by Mahout-Quality
 M-1612: NPE during JSON outputformatter for clusterdump

 Stevo Slavic
 
 M-1650: upgrade 3rd party jars
 M-1602: Euclidean Distance Similarity Math
 M-1278: Improve inheritance of apache parent pom

 Shannon Quinn
 ---
 M-1540: Reuters Example spectral clustering
 Also online docs for Spectral clustering

 Ted Dunning
 ---
 M-1636: Class dependencies for Spark module are put in job.jar, which is
 inefficient




Mahout 0.10.0 Bug bash

2015-03-26 Thread Suneel Marthi
Ok here's the bug bash as of today

Andrew Palumbo
--
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Exception when providing classification Labels
M-1493: Port Naive Bayes to Spark DSL
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-
M-1655: Refactor module dependencies
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump

Dmitriy Lyubimov
--
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
-
M-1649: Lucene 5 upgrade

Pat Ferrel
-
M-1589: mahout.cmd has duplicated content
M-1618: co-occurence recommender example

Suneel Marthi
-
M-1586: Collections downloads must have hash signatures
M-1647: Release build
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1443: Update How to Release page
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump

Stevo Slavic

M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom

Shannon Quinn
---
M-1540: Reuters Example spectral clustering
Also online docs for Spectral clustering

Ted Dunning
---
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient