[jira] [Created] (MAHOUT-1414) Mahout Stream Analysis Package
Amir Rahnama created MAHOUT-1414: Summary: Mahout Stream Analysis Package Key: MAHOUT-1414 URL: https://issues.apache.org/jira/browse/MAHOUT-1414 Project: Mahout Issue Type: New Feature Reporter: Amir Rahnama I am working with Stream Analysis in Java and unfortunately as I see it, if someone chooses to work with Java in Machine Learning he needs to implement all the stuff. Suggestion is that I contribute to the stream analysis package of Mahout and make it reusable. Guide me on this plz. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly
Till Rohrmann created MAHOUT-1415: - Summary: Clone method on sparse matrices fails if there is an empty row which has not been set explicitly Key: MAHOUT-1415 URL: https://issues.apache.org/jira/browse/MAHOUT-1415 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 1.0 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2 Reporter: Till Rohrmann Priority: Minor Fix For: 1.0 The clone method of the SparseMatrix class fails with a NullPointerException if there exists an empty row in the matrix which has not been explicitly set. The reason for this problem is that the clone operation iterates over all rows and clones them whether there exists a Vector instance for this row or not. The problem should be easily fixed by iterating only over the existing matrix slices. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: [jira] [Created] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly
Cool find, Till. Can you provide a patch? On 02/11/2014 12:58 PM, Till Rohrmann (JIRA) wrote: Till Rohrmann created MAHOUT-1415: - Summary: Clone method on sparse matrices fails if there is an empty row which has not been set explicitly Key: MAHOUT-1415 URL: https://issues.apache.org/jira/browse/MAHOUT-1415 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 1.0 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2 Reporter: Till Rohrmann Priority: Minor Fix For: 1.0 The clone method of the SparseMatrix class fails with a NullPointerException if there exists an empty row in the matrix which has not been explicitly set. The reason for this problem is that the clone operation iterates over all rows and clones them whether there exists a Vector instance for this row or not. The problem should be easily fixed by iterating only over the existing matrix slices. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly
[ https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated MAHOUT-1415: -- Status: Patch Available (was: Open) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly Key: MAHOUT-1415 URL: https://issues.apache.org/jira/browse/MAHOUT-1415 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 1.0 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2 Reporter: Till Rohrmann Priority: Minor Labels: newbie Fix For: 1.0 Original Estimate: 10m Remaining Estimate: 10m The clone method of the SparseMatrix class fails with a NullPointerException if there exists an empty row in the matrix which has not been explicitly set. The reason for this problem is that the clone operation iterates over all rows and clones them whether there exists a Vector instance for this row or not. The problem should be easily fixed by iterating only over the existing matrix slices. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly
[ https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated MAHOUT-1415: -- Attachment: MAHOUT-1415.patch Clone method on sparse matrices fails if there is an empty row which has not been set explicitly Key: MAHOUT-1415 URL: https://issues.apache.org/jira/browse/MAHOUT-1415 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 1.0 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2 Reporter: Till Rohrmann Priority: Minor Labels: newbie Fix For: 1.0 Attachments: MAHOUT-1415.patch Original Estimate: 10m Remaining Estimate: 10m The clone method of the SparseMatrix class fails with a NullPointerException if there exists an empty row in the matrix which has not been explicitly set. The reason for this problem is that the clone operation iterates over all rows and clones them whether there exists a Vector instance for this row or not. The problem should be easily fixed by iterating only over the existing matrix slices. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package
[ https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897832#comment-13897832 ] Ted Dunning commented on MAHOUT-1414: - Happy to help. Can you say what your contribution will do more specifically? Mahout Stream Analysis Package -- Key: MAHOUT-1414 URL: https://issues.apache.org/jira/browse/MAHOUT-1414 Project: Mahout Issue Type: New Feature Reporter: Amir Rahnama I am working with Stream Analysis in Java and unfortunately as I see it, if someone chooses to work with Java in Machine Learning he needs to implement all the stuff. Suggestion is that I contribute to the stream analysis package of Mahout and make it reusable. Guide me on this plz. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package
[ https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897852#comment-13897852 ] Amir Rahnama commented on MAHOUT-1414: -- Well there are a lot of algorithms such as Adaptive Sliding Window, Random Samplings, Time Frames and so forth when the data comes in streams and it changes continuously over time therefore you do a lot of in-memory calculations. Like working with Twitter Stream API where data flows continuously and is likely to change more often. Mahout Stream Analysis Package -- Key: MAHOUT-1414 URL: https://issues.apache.org/jira/browse/MAHOUT-1414 Project: Mahout Issue Type: New Feature Reporter: Amir Rahnama I am working with Stream Analysis in Java and unfortunately as I see it, if someone chooses to work with Java in Machine Learning he needs to implement all the stuff. Suggestion is that I contribute to the stream analysis package of Mahout and make it reusable. Guide me on this plz. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package
[ https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897863#comment-13897863 ] Suneel Marthi commented on MAHOUT-1414: --- Have you looked at Project Samoa - http://yahoo.github.io/samoa/ ? Its ML based on Storm for handling Streams. Mahout Stream Analysis Package -- Key: MAHOUT-1414 URL: https://issues.apache.org/jira/browse/MAHOUT-1414 Project: Mahout Issue Type: New Feature Reporter: Amir Rahnama I am working with Stream Analysis in Java and unfortunately as I see it, if someone chooses to work with Java in Machine Learning he needs to implement all the stuff. Suggestion is that I contribute to the stream analysis package of Mahout and make it reusable. Guide me on this plz. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package
[ https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897867#comment-13897867 ] Amir Rahnama commented on MAHOUT-1414: -- Thanks Sunneel. I will look at it. Don't you think since Mahout is a machine learning library users expect to find some of the algorithms for stream analysis in there too? Or maybe it is not inside the scope of Mahout?! Mahout Stream Analysis Package -- Key: MAHOUT-1414 URL: https://issues.apache.org/jira/browse/MAHOUT-1414 Project: Mahout Issue Type: New Feature Reporter: Amir Rahnama I am working with Stream Analysis in Java and unfortunately as I see it, if someone chooses to work with Java in Machine Learning he needs to implement all the stuff. Suggestion is that I contribute to the stream analysis package of Mahout and make it reusable. Guide me on this plz. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package
[ https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897873#comment-13897873 ] Ted Dunning commented on MAHOUT-1414: - Mahout has quite a number of algorithms for streaming data. See the OnlineSummarizer for instance. That is why I was asking you to be more specific. Without specifics, we can't really give you any kind of answer about whether it is appropriate for Mahout. Note also that Mahout does not have a mission to do all kinds of machine learning. Mahout Stream Analysis Package -- Key: MAHOUT-1414 URL: https://issues.apache.org/jira/browse/MAHOUT-1414 Project: Mahout Issue Type: New Feature Reporter: Amir Rahnama I am working with Stream Analysis in Java and unfortunately as I see it, if someone chooses to work with Java in Machine Learning he needs to implement all the stuff. Suggestion is that I contribute to the stream analysis package of Mahout and make it reusable. Guide me on this plz. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly
[ https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter reassigned MAHOUT-1415: -- Assignee: Sebastian Schelter Clone method on sparse matrices fails if there is an empty row which has not been set explicitly Key: MAHOUT-1415 URL: https://issues.apache.org/jira/browse/MAHOUT-1415 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 1.0 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2 Reporter: Till Rohrmann Assignee: Sebastian Schelter Priority: Minor Labels: newbie Fix For: 1.0 Attachments: MAHOUT-1415.patch Original Estimate: 10m Remaining Estimate: 10m The clone method of the SparseMatrix class fails with a NullPointerException if there exists an empty row in the matrix which has not been explicitly set. The reason for this problem is that the clone operation iterates over all rows and clones them whether there exists a Vector instance for this row or not. The problem should be easily fixed by iterating only over the existing matrix slices. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly
[ https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1415. Resolution: Fixed fixed, thank you Till! Clone method on sparse matrices fails if there is an empty row which has not been set explicitly Key: MAHOUT-1415 URL: https://issues.apache.org/jira/browse/MAHOUT-1415 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 1.0 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2 Reporter: Till Rohrmann Assignee: Sebastian Schelter Priority: Minor Labels: newbie Fix For: 1.0 Attachments: MAHOUT-1415.patch Original Estimate: 10m Remaining Estimate: 10m The clone method of the SparseMatrix class fails with a NullPointerException if there exists an empty row in the matrix which has not been explicitly set. The reason for this problem is that the clone operation iterates over all rows and clones them whether there exists a Vector instance for this row or not. The problem should be easily fixed by iterating only over the existing matrix slices. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly
[ https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897921#comment-13897921 ] Hudson commented on MAHOUT-1415: SUCCESS: Integrated in Mahout-Quality #2470 (See [https://builds.apache.org/job/Mahout-Quality/2470/]) MAHOUT-1415: Clone method on sparse matrices fails if there is an empty row which has not been set explicitly (ssc: rev 1567165) * /mahout/trunk/CHANGELOG * /mahout/trunk/math/src/main/java/org/apache/mahout/math/SparseMatrix.java * /mahout/trunk/math/src/test/java/org/apache/mahout/math/TestSparseMatrix.java Clone method on sparse matrices fails if there is an empty row which has not been set explicitly Key: MAHOUT-1415 URL: https://issues.apache.org/jira/browse/MAHOUT-1415 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 1.0 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2 Reporter: Till Rohrmann Assignee: Sebastian Schelter Priority: Minor Labels: newbie Fix For: 1.0 Attachments: MAHOUT-1415.patch Original Estimate: 10m Remaining Estimate: 10m The clone method of the SparseMatrix class fails with a NullPointerException if there exists an empty row in the matrix which has not been explicitly set. The reason for this problem is that the clone operation iterates over all rows and clones them whether there exists a Vector instance for this row or not. The problem should be easily fixed by iterating only over the existing matrix slices. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Mahout 0.9 Release Notes - First Draft
Here's a draft of the Release Notes for Mahout 0.9, Please review the same. -- The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.9 release is mainly a clean up release in preparation for an upcoming 1.0 release targeted for first half of 2014, but there are a few significant new features, which are highlighted below. To get started with Apache Mahout 0.9, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.9 release include, but are not limited to the list below. For further information, see the included CHANGELOG[1] file. - MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra. See http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html - MAHOUT-1288: Recommenders as a Search. See https://github.com/pferrel/solr-recommender - MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1 - MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering See https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf for the details. - MAHOUT-1265: MultiLayer Perceptron (MLP) classifier This is an early implementation of MLP to solicit user feedback, needs to be integrated into Mahout’s processing pipeline to work with Mahout’s vectors. - Removed Deprecated algorithms as they have been either replaced by better performing algorithms or lacked user support and maintenance. - the usual bug fixes. See [2] for more information on the 0.9 release. A total of 113 separate JIRA issues were addressed in this release. The following algorithms that were marked deprecated in 0.8 have been removed in 0.9: - From Clustering: Switched LDA implementation from using Dirtichlet to Collapsed Variational Bayes (CVB) Meanshift MinHash - removed due to poor performance, lack of support and lack of usage - From Classification (both are sequential implementations) Winnow - lack of actual usage and support Perceptron - lack of actual usage and support - Collaborative Filtering SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Hadoop entropy stuff in org.apache.mahout.math.stats.entropy CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page http://mahout.apache.org/developers/how-to-contribute.html or contact us via email at dev@mahout.apache.org. As the project moves towards a 1.0 release, the community will be focused on key algorithms that are proven to scale in production and have seen wide-spread adoption. [1] http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661 [2] https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22 On Monday, December 23, 2013 7:41 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: - Mahout Math Lanczos in favour of SSVD IIRC, we agreed to not remove Lanczos, although it was initially deprecated. We should undeprecate it. Some folks like Lanczos in Mahout (for reasons not really clear to me, aside from accuracy when computing svd of a random noise, there are actually 0 reasons to use Lanczos instead). I agree we don't necessarily want to cull it out -- but IMO there should be a clear steer posted in favor of SSVD in the
Does SSVD supports eigendecomposition of non-symmetric non-positive-semidefinitive matrix better than Lanczos?
Just asking for possible replacement of our Lanczos-based PageRank implementation. - Peng
Re: Does SSVD supports eigendecomposition of non-symmetric non-positive-semidefinitive matrix better than Lanczos?
SSVD is very probably better than Lanczos for any large decomposition. That said, it does SVD, not eigen decomposition which means that the question of symmetrical matrices or positive definiteness doesn't much matter. Do you really need eigen-decomposition? On Tue, Feb 11, 2014 at 2:55 PM, peng pc...@uowmail.edu.au wrote: Just asking for possible replacement of our Lanczos-based PageRank implementation. - Peng