Re: Mahout 0.10.0 Bug bash
Things like that question make me more suspicious. We really need to get a handle on the Hadoop version question. I have run: spark-itemsimilarity on Hadoop 1.2.1, 2.6.0 (fails), Andy ran it successfully on 2.2 and a user runs it on 2.4-MapR 2.6.0 seems to find the local file system with these lines: val conf = new Configuration() val fs = FileSystem.get(conf) On the earlier versions of Hadoop, it finds the cluster, or pseudo cluster HDFS I’ve run Any’s 20 new groups classifier test script on hadoop 1.2.1 with a classdef mismatch error, that probably means I built wrong. I’ll be testing that again Monday. i’m building a 2.2.0 pseudo cluster and will run 20 news groups and spark-itemsimilairty Monday I guess the big question is still 2.5 or 2.6 does anyone know why the two lines above would cause a problem in recent Hadoop versions? Does someone have a known good 2.6 cluster that they can try a couple tests on? On Apr 5, 2015, at 9:52 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I wonder if that HDFS/FS issue is the same problem I have with cluster-reuters.sh. On Sunday, April 5, 2015, Pat Ferrel p...@occamsmachete.com wrote: Very few of these are on the “official” ticket list here: https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC M-1674 M-1665 M-1648 The next time this is published it would be great to get versions of Hadoop people are using and what has actually been run on a cluster or pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on H2.6.0 but may not have an airtight configuration. If anyone has this config woking I can supply a very simple test. The failure happens when an HDFS path gets applied to the raw local filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is unset. The root of the error I’ve seen is in getting the FileSystem, which always returns the local one. M-1674 is new and was found on Friday. Dmitriy already has a private fix but can’t commit it so I think we need a workaround. On Apr 4, 2015, at 8:46 PM, Suneel Marthi suneel.mar...@gmail.com javascript:; wrote: Saturday(2 days before code freeze). The code freeze's gonna be on Monday - April 6. Please address ur assigned JIRAs on time. Anand Avati - M-1622: Multithreaded batch Item similarities output incorrect similarities M-1605: Make Visualizer test locale independent Andrew Palumbo -- M-1559: Add documentation for Wikipedia example M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1655: Refactor module dependencies M-1658: KMeans fails when run on Hadoop clusters Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1633: Failure to execute query when solr index contains documents with different fields M-1649: Lucene 5 upgrade Pat Ferrel - M-1507: Support input and output using user defined ID wherever possible M-1588: Multiple Input path support in Recommenders Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1585: Javadocs are not hosted By Mahout Quality M-1650: upgrade 3rd party jars Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1652: Java 7 update M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException Ted Dunning --- M-1672: TDigest update to 3.1 in OnlineSummarizers Unassigned -- M-1551: Add document to describe how to use mlp with command line(Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other classs
Re: Mahout 0.10.0 Bug bash
Very few of these are on the “official” ticket list here: https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC M-1674 M-1665 M-1648 The next time this is published it would be great to get versions of Hadoop people are using and what has actually been run on a cluster or pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on H2.6.0 but may not have an airtight configuration. If anyone has this config woking I can supply a very simple test. The failure happens when an HDFS path gets applied to the raw local filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is unset. The root of the error I’ve seen is in getting the FileSystem, which always returns the local one. M-1674 is new and was found on Friday. Dmitriy already has a private fix but can’t commit it so I think we need a workaround. On Apr 4, 2015, at 8:46 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Saturday(2 days before code freeze). The code freeze's gonna be on Monday - April 6. Please address ur assigned JIRAs on time. Anand Avati - M-1622: Multithreaded batch Item similarities output incorrect similarities M-1605: Make Visualizer test locale independent Andrew Palumbo -- M-1559: Add documentation for Wikipedia example M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1655: Refactor module dependencies M-1658: KMeans fails when run on Hadoop clusters Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1633: Failure to execute query when solr index contains documents with different fields M-1649: Lucene 5 upgrade Pat Ferrel - M-1507: Support input and output using user defined ID wherever possible M-1588: Multiple Input path support in Recommenders Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1585: Javadocs are not hosted By Mahout Quality M-1650: upgrade 3rd party jars Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1652: Java 7 update M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException Ted Dunning --- M-1672: TDigest update to 3.1 in OnlineSummarizers Unassigned -- M-1551: Add document to describe how to use mlp with command line(Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other classs
Re: Mahout 0.10.0 Bug bash
I wonder if that HDFS/FS issue is the same problem I have with cluster-reuters.sh. On Sunday, April 5, 2015, Pat Ferrel p...@occamsmachete.com wrote: Very few of these are on the “official” ticket list here: https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC M-1674 M-1665 M-1648 The next time this is published it would be great to get versions of Hadoop people are using and what has actually been run on a cluster or pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on H2.6.0 but may not have an airtight configuration. If anyone has this config woking I can supply a very simple test. The failure happens when an HDFS path gets applied to the raw local filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is unset. The root of the error I’ve seen is in getting the FileSystem, which always returns the local one. M-1674 is new and was found on Friday. Dmitriy already has a private fix but can’t commit it so I think we need a workaround. On Apr 4, 2015, at 8:46 PM, Suneel Marthi suneel.mar...@gmail.com javascript:; wrote: Saturday(2 days before code freeze). The code freeze's gonna be on Monday - April 6. Please address ur assigned JIRAs on time. Anand Avati - M-1622: Multithreaded batch Item similarities output incorrect similarities M-1605: Make Visualizer test locale independent Andrew Palumbo -- M-1559: Add documentation for Wikipedia example M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1655: Refactor module dependencies M-1658: KMeans fails when run on Hadoop clusters Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1633: Failure to execute query when solr index contains documents with different fields M-1649: Lucene 5 upgrade Pat Ferrel - M-1507: Support input and output using user defined ID wherever possible M-1588: Multiple Input path support in Recommenders Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1585: Javadocs are not hosted By Mahout Quality M-1650: upgrade 3rd party jars Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1652: Java 7 update M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException Ted Dunning --- M-1672: TDigest update to 3.1 in OnlineSummarizers Unassigned -- M-1551: Add document to describe how to use mlp with command line(Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other classs
Re: Mahout 0.10.0 Bug bash
Saturday(2 days before code freeze). The code freeze's gonna be on Monday - April 6. Please address ur assigned JIRAs on time. Anand Avati - M-1622: Multithreaded batch Item similarities output incorrect similarities M-1605: Make Visualizer test locale independent Andrew Palumbo -- M-1559: Add documentation for Wikipedia example M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1655: Refactor module dependencies M-1658: KMeans fails when run on Hadoop clusters Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1633: Failure to execute query when solr index contains documents with different fields M-1649: Lucene 5 upgrade Pat Ferrel - M-1507: Support input and output using user defined ID wherever possible M-1588: Multiple Input path support in Recommenders Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1585: Javadocs are not hosted By Mahout Quality M-1650: upgrade 3rd party jars Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1652: Java 7 update M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException Ted Dunning --- M-1672: TDigest update to 3.1 in OnlineSummarizers Unassigned -- M-1551: Add document to describe how to use mlp with command line(Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other classs
Re: Mahout 0.10.0 Bug bash
Wednesday(*four days from code freeze Sunday*); some progress: Andrew Palumbo -- M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1564: Naive Bayes classifier for new Text Documents M-1617: 404 error on link in cluster-dumper tutorial page M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1655: Refactor module dependencies Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1633: Failure to execute query when solr index contains documents with different fields M-1649: Lucene 5 upgrade Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Pat Ferrel - M-1507: Support input and output using user defined ID wherever possible Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Shannon Quinn --- M-1661: Remove Lanczos from the code base M-1662: Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1585: Javadocs are not hosted By Mahout Quality M-1650: upgrade 3rd party jars Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1586: Collections downloads must have hash signatures M-1647: The release build is incomplete M-1652: Java 7 update M-1656: Change SNAPSHOT version from 1.0 to 0.10 Unassigned -- M-1551: Add document to describe how to use mlp with command line(Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other classs
Re: Mahout 0.10.0 Bug bash
Monday(six days from code freeze Sunday) Andrew Palumbo -- M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1564: Naive Bayes classifier for new Text Documents M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1522: Handle logging levels via log4j.xml M-1655: Refactor module dependencies Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1633: Failure to execute query when solr index contains documents with different fields M-1649: Lucene 5 upgrade Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Pat Ferrel - M-1507: Support input and output using user defined ID wherever possible M-1589: mahout.cmd has duplicated content(Patch available) Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Shannon Quinn --- M-1661: Remove Lanczos from the code base M-1662: Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1585: Javadocs are not hosted By Mahout Quality M-1602: Euclidean Distance Similarity Math M-1650: upgrade 3rd party jars Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1586: Collections downloads must have hash signatures M-1619: HighDFWordsPruner overwrites cache files M-1647: The release build is incomplete M-1652: Java 7 update M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf Unassigned -- M-1551: Add document to describe how to use mlp with command line(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other classs
Re: Mahout 0.10.0 Bug bash
yeah there's something weird going on with M-1609, but I closed it on Friday. On 03/29/2015 12:36 PM, Andrew Musselman wrote: Sunday's: Andrew Palumbo -- M-1477: Clean up website on Logistic Regression M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1564: Naive Bayes classifier for new Text Documents M-1609: NullPointerException(This bug is not showing up aside from its title) M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1655: Refactor module dependencies Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1585: Javadocs not hosted by Mahout-Quality M-1586: Collections downloads must have hash signatures M-1619: HighDFWordsPruner overwrites cache files M-1647: The release build is incomplete M-1652: Java 7 update M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1602: Euclidean Distance Similarity Math M-1650: upgrade 3rd party jars Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1539: Implement affinity matrix computation in Mahout DSL M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1551: Add document to describe how to use mlp with command line(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class
Re: Mahout 0.10.0 Bug bash
A daily politely harsh' reminder of the April 5 code freeze date with the daily bug bash would be helpful. On Sun, Mar 29, 2015 at 12:36 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Sunday's: Andrew Palumbo -- M-1477: Clean up website on Logistic Regression M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1564: Naive Bayes classifier for new Text Documents M-1609: NullPointerException(This bug is not showing up aside from its title) M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1655: Refactor module dependencies Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1585: Javadocs not hosted by Mahout-Quality M-1586: Collections downloads must have hash signatures M-1619: HighDFWordsPruner overwrites cache files M-1647: The release build is incomplete M-1652: Java 7 update M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1602: Euclidean Distance Similarity Math M-1650: upgrade 3rd party jars Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1539: Implement affinity matrix computation in Mahout DSL M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1551: Add document to describe how to use mlp with command line(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class
Re: Mahout 0.10.0 Bug bash
Yes, reminder we want to freeze/slush next Sunday. If you won't be able to finish your bugs let's do some more triage and split up work. On Sunday, March 29, 2015, Suneel Marthi suneel.mar...@gmail.com wrote: A daily politely harsh' reminder of the April 5 code freeze date with the daily bug bash would be helpful. On Sun, Mar 29, 2015 at 12:36 PM, Andrew Musselman andrew.mussel...@gmail.com javascript:; wrote: Sunday's: Andrew Palumbo -- M-1477: Clean up website on Logistic Regression M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1564: Naive Bayes classifier for new Text Documents M-1609: NullPointerException(This bug is not showing up aside from its title) M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1655: Refactor module dependencies Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1585: Javadocs not hosted by Mahout-Quality M-1586: Collections downloads must have hash signatures M-1619: HighDFWordsPruner overwrites cache files M-1647: The release build is incomplete M-1652: Java 7 update M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1602: Euclidean Distance Similarity Math M-1650: upgrade 3rd party jars Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1539: Implement affinity matrix computation in Mahout DSL M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1551: Add document to describe how to use mlp with command line (Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class
Re: Mahout 0.10.0 Bug bash
Sometimes it comes up and sometimes it doesn't, but it is resolved. On 03/29/2015 01:57 PM, Suneel Marthi wrote: yeah i noticed the weirdness with M-1609 too. Well lets keep that out of the daily bug bash. On Sun, Mar 29, 2015 at 1:55 PM, Andrew Palumbo ap@outlook.com wrote: yeah there's something weird going on with M-1609, but I closed it on Friday. On 03/29/2015 12:36 PM, Andrew Musselman wrote: Sunday's: Andrew Palumbo -- M-1477: Clean up website on Logistic Regression M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1564: Naive Bayes classifier for new Text Documents M-1609: NullPointerException(This bug is not showing up aside from its title) M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1655: Refactor module dependencies Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1585: Javadocs not hosted by Mahout-Quality M-1586: Collections downloads must have hash signatures M-1619: HighDFWordsPruner overwrites cache files M-1647: The release build is incomplete M-1652: Java 7 update M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1602: Euclidean Distance Similarity Math M-1650: upgrade 3rd party jars Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1539: Implement affinity matrix computation in Mahout DSL M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1551: Add document to describe how to use mlp with command line (Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class
Re: Mahout 0.10.0 Bug bash
Sunday's: Andrew Palumbo -- M-1477: Clean up website on Logistic Regression M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1564: Naive Bayes classifier for new Text Documents M-1609: NullPointerException(This bug is not showing up aside from its title) M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1648: Update CMS for Mahout 0.10.0 Andrew Musselman - M-1462: Cleaning up Random Forests documentation on Mahout website M-1470: LDA Topic dump M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1655: Refactor module dependencies Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) Suneel Marthi - M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1512: Hadoop 2 compatibility M-1585: Javadocs not hosted by Mahout-Quality M-1586: Collections downloads must have hash signatures M-1619: HighDFWordsPruner overwrites cache files M-1647: The release build is incomplete M-1652: Java 7 update M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf Stevo Slavic M-1277: Lose dependency on custom commons-cli M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1602: Euclidean Distance Similarity Math M-1650: upgrade 3rd party jars Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1539: Implement affinity matrix computation in Mahout DSL M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1551: Add document to describe how to use mlp with command line(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class
Re: Mahout 0.10.0 Bug bash
Today's: Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1477: Clean up website on Logistic Regression M-1564: Naive Bayes classifier for new Text Documents M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1470: LDA Topic dump M-1462: Cleaning up Random Forests documentation on Mahout website Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: The release build is incomplete M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page(Tagged 0.10.1) M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf M-1619: HighDFWordsPruner overwrites cache files Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1277: Lose dependency on custom commons-cli Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1593: Implement affinity matrix computation in Mahout DSL M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1643: CLI arguments are not being processed in spark-shell M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1551: Add document to describe how to use mlp with command line(Patch available) On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: Release build M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump Stevo Slavic
Re: Mahout 0.10.0 Bug bash
Seems like we are stretched pretty thin given the work load, not to mention that Mahout work is completely orthogonal to our paychecks. Ted, Grant, Shannon - possible you guys could take some of the load?? On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Today's: Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1477: Clean up website on Logistic Regression M-1564: Naive Bayes classifier for new Text Documents M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1470: LDA Topic dump M-1462: Cleaning up Random Forests documentation on Mahout website Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: The release build is incomplete M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page(Tagged 0.10.1) M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf M-1619: HighDFWordsPruner overwrites cache files Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1277: Lose dependency on custom commons-cli Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1593: Implement affinity matrix computation in Mahout DSL M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1643: CLI arguments are not being processed in spark-shell M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1551: Add document to describe how to use mlp with command line(Patch available) On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content M-1618: co-occurence recommender example Suneel
Re: Mahout 0.10.0 Bug bash
Wait, I thought all DSL work on spectral clustering was waiting until 0.10.1? iPhone'd On Mar 28, 2015, at 13:49, Suneel Marthi suneel.mar...@gmail.com wrote: Seems like we are stretched pretty thin given the work load, not to mention that Mahout work is completely orthogonal to our paychecks. Ted, Grant, Shannon - possible you guys could take some of the load?? On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Today's: Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1477: Clean up website on Logistic Regression M-1564: Naive Bayes classifier for new Text Documents M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1470: LDA Topic dump M-1462: Cleaning up Random Forests documentation on Mahout website Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: The release build is incomplete M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page(Tagged 0.10.1) M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf M-1619: HighDFWordsPruner overwrites cache files Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1277: Lose dependency on custom commons-cli Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1593: Implement affinity matrix computation in Mahout DSL M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1643: CLI arguments are not being processed in spark-shell M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1551: Add document to describe how to use mlp with command line(Patch available) On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten -
Re: Mahout 0.10.0 Bug bash
that's right, feel free to edit ur Jiras to reflect that. On Sat, Mar 28, 2015 at 2:22 PM, Shannon Quinn squ...@gatech.edu wrote: Wait, I thought all DSL work on spectral clustering was waiting until 0.10.1? iPhone'd On Mar 28, 2015, at 13:49, Suneel Marthi suneel.mar...@gmail.com wrote: Seems like we are stretched pretty thin given the work load, not to mention that Mahout work is completely orthogonal to our paychecks. Ted, Grant, Shannon - possible you guys could take some of the load?? On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Today's: Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1477: Clean up website on Logistic Regression M-1564: Naive Bayes classifier for new Text Documents M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1470: LDA Topic dump M-1462: Cleaning up Random Forests documentation on Mahout website Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: The release build is incomplete M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page(Tagged 0.10.1) M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf M-1619: HighDFWordsPruner overwrites cache files Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1277: Lose dependency on custom commons-cli Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1593: Implement affinity matrix computation in Mahout DSL M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1643: CLI arguments are not being processed in spark-shell M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1551: Add document to describe how to use mlp with command line (Patch available) On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman
Re: Mahout 0.10.0 Bug bash
Ah no worries, just got a bit panicked when I saw that. Summer will be better for me but for now these tickets have about maxed me out; 3 months into the new tenure-track shtick is grueling. iPhone'd On Mar 28, 2015, at 14:27, Andrew Musselman andrew.mussel...@gmail.com wrote: Okay, go ahead and move it; I was just moving things from 1.0 to 0.10.0 almost indiscriminately. On Sat, Mar 28, 2015 at 11:22 AM, Shannon Quinn squ...@gatech.edu wrote: Wait, I thought all DSL work on spectral clustering was waiting until 0.10.1? iPhone'd On Mar 28, 2015, at 13:49, Suneel Marthi suneel.mar...@gmail.com wrote: Seems like we are stretched pretty thin given the work load, not to mention that Mahout work is completely orthogonal to our paychecks. Ted, Grant, Shannon - possible you guys could take some of the load?? On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Today's: Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1477: Clean up website on Logistic Regression M-1564: Naive Bayes classifier for new Text Documents M-1635: Getting an exception when I provide classification labels manually for Naive Bayes M-1493: Port Naive Bayes to Spark DSL(Patch available) M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1522: Handle logging levels via log4j.xml M-1563: cleanup Warnings during Build M-1470: LDA Topic dump M-1462: Cleaning up Random Forests documentation on Mahout website Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade M-1625: lucene2seq: failure to convert a document that does not contain a field (the field is not required) Pat Ferrel - M-1589: mahout.cmd has duplicated content(Patch available) M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: The release build is incomplete M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page(Tagged 0.10.1) M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump M-1656: Change SNAPSHOT version from 1.0 to 0.10 M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf M-1619: HighDFWordsPruner overwrites cache files Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom M-1562: Publish Scaladocs M-1277: Lose dependency on custom commons-cli Shannon Quinn --- M-1538: Port spectral clustering to Mahout DSL M-1593: Implement affinity matrix computation in Mahout DSL M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering M-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient Sebastian Schelter -- M-1584: Create a detailed example of how to index an arbitrary dataset and run LDA on it(Patch available) Gokhan Capan -- M-1626: Support for required quasi-algebraic operations and starting with aggregating rows/blocks Unassigned -- M-1594: Example factorize-movielens-1M.sh does not use HDFS(Patch available) M-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException(Patch available) M-1557: Add support for sparse training vectors in MLP(Patch available) M-1516: run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.(Patch available) M-1643: CLI arguments are not being processed in spark-shell M-1637: RecommenderJob of ALS fails in the mapper because it uses the instance of other class M-1634: ALS don't work when it adds new files in Distributed Cache (Patch available) M-1633: Failure to execute query when solr index contains documents with different fields M-1551: Add document to describe how to use mlp with command line (Patch available) On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing
Re: Mahout 0.10.0 Bug bash
Yes--removing the Lanczos solver from spectral clustering. On 3/27/15 10:29 AM, Suneel Marthi wrote: and this is for 0.10.0 ??? On Fri, Mar 27, 2015 at 10:27 AM, Shannon Quinn squ...@gatech.edu wrote: Created M-1659 and assigned it to myself to reflect current work. Shannon On 3/26/15 10:07 PM, Suneel Marthi wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: Release build M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom Shannon Quinn --- M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient
Re: Mahout 0.10.0 Bug bash
Created M-1659 and assigned it to myself to reflect current work. Shannon On 3/26/15 10:07 PM, Suneel Marthi wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: Release build M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom Shannon Quinn --- M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient
Re: Mahout 0.10.0 Bug bash
Not sure what to do about the Windows mahout.cmd script. I don’t even own a Window VM so there is no way I can look into this except for asking for help, which I have done. What happens if no one volunteers? Is this a blocker? M-1589 I took M-1636, should be resolved. Need a final test on a cluster, which I am trying today. Aren’t M-1655 and M-1646 the same? Dmitriy is not committing code so any work must be reassigned if it needs to be done. On Mar 26, 2015, at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: Release build M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom Shannon Quinn --- M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient
Re: Mahout 0.10.0 Bug bash
Its not a blocker, I would just close it and move on until the next Windows guy creates a new Jira :) On Fri, Mar 27, 2015 at 11:29 AM, Pat Ferrel p...@occamsmachete.com wrote: Not sure what to do about the Windows mahout.cmd script. I don’t even own a Window VM so there is no way I can look into this except for asking for help, which I have done. What happens if no one volunteers? Is this a blocker? M-1589 I took M-1636, should be resolved. Need a final test on a cluster, which I am trying today. Aren’t M-1655 and M-1646 the same? Dmitriy is not committing code so any work must be reassigned if it needs to be done. On Mar 26, 2015, at 7:07 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: Release build M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom Shannon Quinn --- M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient
Mahout 0.10.0 Bug bash
Ok here's the bug bash as of today Andrew Palumbo -- M-1648: Update CMS for Mahout 0.10.0 M-1638: H2O bindings fail at drmParallelizeWithRowLabels M-1564: Naive Bayes classifier for new Text Documents M-1635: Exception when providing classification Labels M-1493: Port Naive Bayes to Spark DSL M-1559: Documentation and cleanup for Naive Bayes Example M-1609: NullPointerException M-1607: Spark-shell DAG scheduler Andrew Musselman - M-1655: Refactor module dependencies M-1563: cleanup Warnings during Build M-1470: LDA Topic dump Dmitriy Lyubimov -- M-1646: Refactor out all legacy MR dependencies from scala code Frank Scholten - M-1649: Lucene 5 upgrade Pat Ferrel - M-1589: mahout.cmd has duplicated content M-1618: co-occurence recommender example Suneel Marthi - M-1586: Collections downloads must have hash signatures M-1647: Release build M-1652: Java 7 update M-1512: Hadoop 2 compatibility M-1469: Streaming KMeans fails when executed in MR mode and REDUCE_STREAMING_KMEANS set to true M-1443: Update How to Release page M-1585: Javadocs not hosted by Mahout-Quality M-1612: NPE during JSON outputformatter for clusterdump Stevo Slavic M-1650: upgrade 3rd party jars M-1602: Euclidean Distance Similarity Math M-1278: Improve inheritance of apache parent pom Shannon Quinn --- M-1540: Reuters Example spectral clustering Also online docs for Spectral clustering Ted Dunning --- M-1636: Class dependencies for Spark module are put in job.jar, which is inefficient