[jira] [Created] (MAHOUT-1414) Mahout Stream Analysis Package

2014-02-11 Thread Amir Rahnama (JIRA)
Amir Rahnama created MAHOUT-1414:


 Summary: Mahout Stream Analysis Package
 Key: MAHOUT-1414
 URL: https://issues.apache.org/jira/browse/MAHOUT-1414
 Project: Mahout
  Issue Type: New Feature
Reporter: Amir Rahnama


I am working with Stream Analysis in Java and unfortunately as I see it, if 
someone chooses to work with Java in Machine Learning he needs to implement all 
the stuff.

Suggestion is that I contribute to the stream analysis package of Mahout and 
make it reusable.

Guide me on this plz.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly

2014-02-11 Thread Till Rohrmann (JIRA)
Till Rohrmann created MAHOUT-1415:
-

 Summary: Clone method on sparse matrices fails if there is an 
empty row which has not been set explicitly
 Key: MAHOUT-1415
 URL: https://issues.apache.org/jira/browse/MAHOUT-1415
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2
Reporter: Till Rohrmann
Priority: Minor
 Fix For: 1.0


The clone method of the SparseMatrix class fails with a NullPointerException if 
there exists an empty row in the matrix which has not been explicitly set. The 
reason for this problem is that the clone operation iterates over all rows and 
clones them whether there exists a Vector instance for this row or not. The 
problem should be easily fixed by iterating only over the existing matrix 
slices.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: [jira] [Created] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly

2014-02-11 Thread Sebastian Schelter

Cool find, Till. Can you provide a patch?

On 02/11/2014 12:58 PM, Till Rohrmann (JIRA) wrote:

Till Rohrmann created MAHOUT-1415:
-

  Summary: Clone method on sparse matrices fails if there is an 
empty row which has not been set explicitly
  Key: MAHOUT-1415
  URL: https://issues.apache.org/jira/browse/MAHOUT-1415
  Project: Mahout
   Issue Type: Bug
   Components: Math
 Affects Versions: 1.0
  Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2
 Reporter: Till Rohrmann
 Priority: Minor
  Fix For: 1.0


The clone method of the SparseMatrix class fails with a NullPointerException if 
there exists an empty row in the matrix which has not been explicitly set. The 
reason for this problem is that the clone operation iterates over all rows and 
clones them whether there exists a Vector instance for this row or not. The 
problem should be easily fixed by iterating only over the existing matrix 
slices.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)





[jira] [Updated] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly

2014-02-11 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated MAHOUT-1415:
--

Status: Patch Available  (was: Open)

 Clone method on sparse matrices fails if there is an empty row which has not 
 been set explicitly
 

 Key: MAHOUT-1415
 URL: https://issues.apache.org/jira/browse/MAHOUT-1415
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2
Reporter: Till Rohrmann
Priority: Minor
  Labels: newbie
 Fix For: 1.0

   Original Estimate: 10m
  Remaining Estimate: 10m

 The clone method of the SparseMatrix class fails with a NullPointerException 
 if there exists an empty row in the matrix which has not been explicitly set. 
 The reason for this problem is that the clone operation iterates over all 
 rows and clones them whether there exists a Vector instance for this row or 
 not. The problem should be easily fixed by iterating only over the existing 
 matrix slices.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly

2014-02-11 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated MAHOUT-1415:
--

Attachment: MAHOUT-1415.patch

 Clone method on sparse matrices fails if there is an empty row which has not 
 been set explicitly
 

 Key: MAHOUT-1415
 URL: https://issues.apache.org/jira/browse/MAHOUT-1415
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2
Reporter: Till Rohrmann
Priority: Minor
  Labels: newbie
 Fix For: 1.0

 Attachments: MAHOUT-1415.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 The clone method of the SparseMatrix class fails with a NullPointerException 
 if there exists an empty row in the matrix which has not been explicitly set. 
 The reason for this problem is that the clone operation iterates over all 
 rows and clones them whether there exists a Vector instance for this row or 
 not. The problem should be easily fixed by iterating only over the existing 
 matrix slices.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package

2014-02-11 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897832#comment-13897832
 ] 

Ted Dunning commented on MAHOUT-1414:
-

Happy to help.

Can you say what your contribution will do more specifically?



 Mahout Stream Analysis Package
 --

 Key: MAHOUT-1414
 URL: https://issues.apache.org/jira/browse/MAHOUT-1414
 Project: Mahout
  Issue Type: New Feature
Reporter: Amir Rahnama

 I am working with Stream Analysis in Java and unfortunately as I see it, if 
 someone chooses to work with Java in Machine Learning he needs to implement 
 all the stuff.
 Suggestion is that I contribute to the stream analysis package of Mahout and 
 make it reusable.
 Guide me on this plz.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package

2014-02-11 Thread Amir Rahnama (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897852#comment-13897852
 ] 

Amir Rahnama commented on MAHOUT-1414:
--

Well there are a lot of algorithms such as Adaptive Sliding Window, Random 
Samplings, Time Frames and so forth when the data comes in streams and it 
changes continuously over time therefore you do a lot of in-memory 
calculations. Like working with Twitter Stream API where data flows 
continuously and is likely to change more often. 

 Mahout Stream Analysis Package
 --

 Key: MAHOUT-1414
 URL: https://issues.apache.org/jira/browse/MAHOUT-1414
 Project: Mahout
  Issue Type: New Feature
Reporter: Amir Rahnama

 I am working with Stream Analysis in Java and unfortunately as I see it, if 
 someone chooses to work with Java in Machine Learning he needs to implement 
 all the stuff.
 Suggestion is that I contribute to the stream analysis package of Mahout and 
 make it reusable.
 Guide me on this plz.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package

2014-02-11 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897863#comment-13897863
 ] 

Suneel Marthi commented on MAHOUT-1414:
---

Have you looked at Project Samoa - http://yahoo.github.io/samoa/ ?  Its ML 
based on Storm for handling Streams.

 Mahout Stream Analysis Package
 --

 Key: MAHOUT-1414
 URL: https://issues.apache.org/jira/browse/MAHOUT-1414
 Project: Mahout
  Issue Type: New Feature
Reporter: Amir Rahnama

 I am working with Stream Analysis in Java and unfortunately as I see it, if 
 someone chooses to work with Java in Machine Learning he needs to implement 
 all the stuff.
 Suggestion is that I contribute to the stream analysis package of Mahout and 
 make it reusable.
 Guide me on this plz.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package

2014-02-11 Thread Amir Rahnama (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897867#comment-13897867
 ] 

Amir Rahnama commented on MAHOUT-1414:
--

Thanks Sunneel. I will look at it.

Don't you think since Mahout is a machine learning library users expect to find 
some of the algorithms for stream analysis in there too? Or maybe it is not 
inside the scope of Mahout?!

 Mahout Stream Analysis Package
 --

 Key: MAHOUT-1414
 URL: https://issues.apache.org/jira/browse/MAHOUT-1414
 Project: Mahout
  Issue Type: New Feature
Reporter: Amir Rahnama

 I am working with Stream Analysis in Java and unfortunately as I see it, if 
 someone chooses to work with Java in Machine Learning he needs to implement 
 all the stuff.
 Suggestion is that I contribute to the stream analysis package of Mahout and 
 make it reusable.
 Guide me on this plz.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1414) Mahout Stream Analysis Package

2014-02-11 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897873#comment-13897873
 ] 

Ted Dunning commented on MAHOUT-1414:
-

Mahout has quite a number of algorithms for streaming data.  See the 
OnlineSummarizer for instance.  

That is why I was asking you to be more specific.  Without specifics, we can't 
really give you any kind of answer about whether it is appropriate for Mahout.

Note also that Mahout does not have a mission to do all kinds of machine 
learning.

 Mahout Stream Analysis Package
 --

 Key: MAHOUT-1414
 URL: https://issues.apache.org/jira/browse/MAHOUT-1414
 Project: Mahout
  Issue Type: New Feature
Reporter: Amir Rahnama

 I am working with Stream Analysis in Java and unfortunately as I see it, if 
 someone chooses to work with Java in Machine Learning he needs to implement 
 all the stuff.
 Suggestion is that I contribute to the stream analysis package of Mahout and 
 make it reusable.
 Guide me on this plz.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly

2014-02-11 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter reassigned MAHOUT-1415:
--

Assignee: Sebastian Schelter

 Clone method on sparse matrices fails if there is an empty row which has not 
 been set explicitly
 

 Key: MAHOUT-1415
 URL: https://issues.apache.org/jira/browse/MAHOUT-1415
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2
Reporter: Till Rohrmann
Assignee: Sebastian Schelter
Priority: Minor
  Labels: newbie
 Fix For: 1.0

 Attachments: MAHOUT-1415.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 The clone method of the SparseMatrix class fails with a NullPointerException 
 if there exists an empty row in the matrix which has not been explicitly set. 
 The reason for this problem is that the clone operation iterates over all 
 rows and clones them whether there exists a Vector instance for this row or 
 not. The problem should be easily fixed by iterating only over the existing 
 matrix slices.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly

2014-02-11 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter resolved MAHOUT-1415.


Resolution: Fixed

fixed, thank you Till!

 Clone method on sparse matrices fails if there is an empty row which has not 
 been set explicitly
 

 Key: MAHOUT-1415
 URL: https://issues.apache.org/jira/browse/MAHOUT-1415
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2
Reporter: Till Rohrmann
Assignee: Sebastian Schelter
Priority: Minor
  Labels: newbie
 Fix For: 1.0

 Attachments: MAHOUT-1415.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 The clone method of the SparseMatrix class fails with a NullPointerException 
 if there exists an empty row in the matrix which has not been explicitly set. 
 The reason for this problem is that the clone operation iterates over all 
 rows and clones them whether there exists a Vector instance for this row or 
 not. The problem should be easily fixed by iterating only over the existing 
 matrix slices.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1415) Clone method on sparse matrices fails if there is an empty row which has not been set explicitly

2014-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897921#comment-13897921
 ] 

Hudson commented on MAHOUT-1415:


SUCCESS: Integrated in Mahout-Quality #2470 (See 
[https://builds.apache.org/job/Mahout-Quality/2470/])
MAHOUT-1415: Clone method on sparse matrices fails if there is an empty row 
which has not been set explicitly (ssc: rev 1567165)
* /mahout/trunk/CHANGELOG
* /mahout/trunk/math/src/main/java/org/apache/mahout/math/SparseMatrix.java
* /mahout/trunk/math/src/test/java/org/apache/mahout/math/TestSparseMatrix.java


 Clone method on sparse matrices fails if there is an empty row which has not 
 been set explicitly
 

 Key: MAHOUT-1415
 URL: https://issues.apache.org/jira/browse/MAHOUT-1415
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 1.0
 Environment: Mac OS X Mavericks, Darwin Kernel Version 13.0.2
Reporter: Till Rohrmann
Assignee: Sebastian Schelter
Priority: Minor
  Labels: newbie
 Fix For: 1.0

 Attachments: MAHOUT-1415.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 The clone method of the SparseMatrix class fails with a NullPointerException 
 if there exists an empty row in the matrix which has not been explicitly set. 
 The reason for this problem is that the clone operation iterates over all 
 rows and clones them whether there exists a Vector instance for this row or 
 not. The problem should be easily fixed by iterating only over the existing 
 matrix slices.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Mahout 0.9 Release Notes - First Draft

2014-02-11 Thread Suneel Marthi
Here's a draft of the Release Notes for Mahout 0.9, Please review the same.

--


The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of collaborative filtering (recommenders),
clustering and classification (known collectively as the 3Cs), as well as the
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
Cassandra and much more. The 0.9 release is mainly a clean up release in
preparation for an upcoming 1.0 release targeted for first half of 2014, but 
there are a few
significant new features, which are highlighted below.

To get started with Apache Mahout 0.9, download the release artifacts and 
signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central 
Maven repository.

As with any release, we wish to thank all of the users and contributors
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
individual credits, as there are too many to list here.

GETTING STARTED

In the release package, the examples directory contains several working 
examples of the core
functionality available in Mahout. These can be run via scripts in the 
examples/bin
directory and will prompt you for more information to help you try things out. 
Most examples do not need a Hadoop cluster in order to run.

RELEASE HIGHLIGHTS

The highlights of the Apache Mahout 0.9 release include, but are not
limited to the list below. For further information, see the included 
CHANGELOG[1] file.

-  MAHOUT-1297: Scala DSL Bindings for Mahout Math Linear Algebra.
   See 
http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
-  MAHOUT-1288: Recommenders as a Search.  See 
https://github.com/pferrel/solr-recommender
-  MAHOUT-1364: Upgrade Mahout to Lucene 4.6.1
-  MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
1-dimensional Clustering
  See 
https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
 for the details.
-  MAHOUT-1265: MultiLayer Perceptron (MLP) classifier 
   This is an early implementation of MLP to solicit user feedback, needs to be 
integrated into Mahout’s processing pipeline to work with Mahout’s vectors.

- Removed Deprecated algorithms as they have been either replaced by better 
performing algorithms or lacked user support and maintenance.

- the usual bug fixes. See [2] for more information on the 0.9 release.

A total of 113 separate JIRA issues were addressed in this release.

The following algorithms that were marked deprecated in 0.8 have been removed 
in 0.9:

- From Clustering:
   Switched LDA implementation from using Dirtichlet to Collapsed Variational 
Bayes (CVB)

  Meanshift

  MinHash - removed due to poor performance,  lack of support and lack of usage

- From Classification (both are sequential implementations)

  Winnow - lack of actual usage and support

  Perceptron - lack of actual usage and support

- Collaborative Filtering
SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and 
org.apache.mahout.cf.taste.impl.recommender.slopeone
    Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
    TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

- Mahout Math
    Hadoop entropy stuff in org.apache.mahout.math.stats.entropy

CONTRIBUTING

Mahout is always looking for contributions focused on the 3Cs. If you are
interested in contributing, please see our contribution page 
http://mahout.apache.org/developers/how-to-contribute.html or contact us via 
email at dev@mahout.apache.org.

As the project moves towards a 1.0 release, the community will be focused on 
key algorithms that are proven to scale in production and have seen wide-spread 
adoption. 

[1] 
http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?view=markuppathrev=1563661
[2] 
https://issues.apache.org/jira/browse/MAHOUT-1411?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22








On Monday, December 23, 2013 7:41 PM, Dmitriy Lyubimov dlie...@gmail.com 
wrote:
 
On Sun, Dec 22, 2013 at 11:21 AM, Sebastian Schelter 

ssc.o...@googlemail.com wrote:


 
  - Mahout Math
      Lanczos in favour of SSVD

 IIRC, we agreed to not remove Lanczos, although it was initially
 deprecated. We should undeprecate it.


Some folks like Lanczos in Mahout (for reasons not really clear to me,
aside from accuracy when computing svd of a random noise, there are
actually 0 reasons to use Lanczos instead). I agree we don't  necessarily
want to cull it out -- but IMO there should be a clear steer posted in
favor of SSVD in the 

Does SSVD supports eigendecomposition of non-symmetric non-positive-semidefinitive matrix better than Lanczos?

2014-02-11 Thread peng
Just asking for possible replacement of our Lanczos-based PageRank 
implementation. - Peng


Re: Does SSVD supports eigendecomposition of non-symmetric non-positive-semidefinitive matrix better than Lanczos?

2014-02-11 Thread Ted Dunning
SSVD is very probably better than Lanczos for any large decomposition.
 That said, it does SVD, not eigen decomposition which means that the
question of symmetrical matrices or positive definiteness doesn't much
matter.

Do you really need eigen-decomposition?



On Tue, Feb 11, 2014 at 2:55 PM, peng pc...@uowmail.edu.au wrote:

 Just asking for possible replacement of our Lanczos-based PageRank
 implementation. - Peng