Jenkins build is back to normal : Mahout-Examples-Cluster-Reuters-II #1168

2015-04-24 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/1168/



[jira] [Commented] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types

2015-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511938#comment-14511938
 ] 

ASF GitHub Bot commented on MAHOUT-1696:


GitHub user andrewpalumbo opened a pull request:

https://github.com/apache/mahout/pull/124

MAHOUT-1696: QRDecomposition.solve(...) can return incorrect Matrix types 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewpalumbo/mahout qrFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #124


commit 69a1e134c3a46bbbff08c9f68c057737fee364d3
Author: Andrew Palumbo apalu...@apache.org
Date:   2015-04-24T22:38:06Z

Store a small matrix




 QRDecomposition.solve(...) can return incorrect Matrix types
 

 Key: MAHOUT-1696
 URL: https://issues.apache.org/jira/browse/MAHOUT-1696
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.10.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
 Fix For: 0.10.1, 0.11.0


 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is 
 returning a Matrix of type B when it should be returning a matrix of type A.  
 This can lead to Sparse Matrices which should be Dense and vice versa.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types

2015-04-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512074#comment-14512074
 ] 

Hudson commented on MAHOUT-1696:


SUCCESS: Integrated in Mahout-Quality #3138 (See 
[https://builds.apache.org/job/Mahout-Quality/3138/])
MAHOUT-1696: QRDecomposition.solve(...) can return incorrect Matrix types. 
closes apache/mahout#124 (apalumbo: rev 
1f9188d0789640a6514e1120bf4b44061a886165)
* math/src/main/java/org/apache/mahout/math/QRDecomposition.java
* CHANGELOG


 QRDecomposition.solve(...) can return incorrect Matrix types
 

 Key: MAHOUT-1696
 URL: https://issues.apache.org/jira/browse/MAHOUT-1696
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.10.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
 Fix For: 0.10.1, 0.11.0


 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is 
 returning a Matrix of type B when it should be returning a matrix of type A.  
 This can lead to Sparse Matrices which should be Dense and vice versa.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10

2015-04-24 Thread Andrew Musselman
The spark 1.3 compat is in a near future release; what do you need from us
to make 1.1 and 1.2 compat work?

On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org
wrote:


 [
 https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075
 ]

 Konstantin Boudnik commented on BIGTOP-1831:
 

 How is it going guys? Looks like this is one of the blockers for 1.0 as we
 can not use old 0.9 version. Appreciate the help! Thank you!

  Upgrade Mahout to 0.10
  --
 
  Key: BIGTOP-1831
  URL: https://issues.apache.org/jira/browse/BIGTOP-1831
  Project: Bigtop
   Issue Type: Task
   Components: general
 Affects Versions: 0.8.0
 Reporter: David Starina
 Priority: Blocker
   Labels: Mahout
  Fix For: 1.0.0
 
 
  Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x
 compatible release)



 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)



[jira] [Commented] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types

2015-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512022#comment-14512022
 ] 

ASF GitHub Bot commented on MAHOUT-1696:


Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/124


 QRDecomposition.solve(...) can return incorrect Matrix types
 

 Key: MAHOUT-1696
 URL: https://issues.apache.org/jira/browse/MAHOUT-1696
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.10.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
 Fix For: 0.10.1, 0.11.0


 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is 
 returning a Matrix of type B when it should be returning a matrix of type A.  
 This can lead to Sparse Matrices which should be Dense and vice versa.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Streaming and incremental cooccurrence

2015-04-24 Thread Ted Dunning
Sounds about right.

My guess is that memory is now large enough, especially on a cluster that
the cooccurrence will fit into memory quite often.  Taking a large example
of 10 million items and 10,000 cooccurrences each, there will be 100
billion cooccurrences to store which shouldn't take more than about half a
TB of data if fully populated.  This isn't that outrageous any more.  With
SSD's as backing store, even 100GB of RAM or less might well produce very
nice results.  Depending on incoming transaction rates, using spinning disk
as a backing store might also work with small memory.

Experiments are in order.



On Fri, Apr 24, 2015 at 8:12 AM, Pat Ferrel p...@occamsmachete.com wrote:

 Ok, seems right.

 So now to data structures. The input frequency vectors need to be paired
 with each input interaction type and would be nice to have as something
 that can be copied very fast as they get updated. Random access would also
 be nice but iteration is not needed. Over time they will get larger as all
 items get interactions, users will get more actions and appear in more
 vectors (with multi-intereaction data). Seems like hashmaps?

 The cooccurrence matrix is more of a question to me. It needs to be
 updatable at the row and column level, and random access for both row and
 column would be nice. It needs to be expandable. To keep it small the keys
 should be integers, not full blown ID strings. There will have to be one
 matrix per interaction type. It should be simple to update the Search
 Engine to either mirror the matrix of use it directly for index updates.
 Each indicator update should cause an index update.

 Putting aside speed and size issues this sounds like a NoSQL DB table that
 is cached in-memeory.

 On Apr 23, 2015, at 3:04 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Thu, Apr 23, 2015 at 8:53 AM, Pat Ferrel p...@occamsmachete.com wrote:

  This seems to violate the random choice of interactions to cut but now
  that I think about it does a random choice really matter?
 

 It hasn't ever mattered such that I could see.  There is also some reason
 to claim that earliest is best if items are very focussed in time.  Of
 course, the opposite argument also applies.  That leaves us with empiricism
 where the results are not definitive.

 So I don't think that it matters, but I don't think that it does.




[jira] [Resolved] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types

2015-04-24 Thread Andrew Palumbo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo resolved MAHOUT-1696.

Resolution: Fixed

pushed to master and 0.10.x branch

 QRDecomposition.solve(...) can return incorrect Matrix types
 

 Key: MAHOUT-1696
 URL: https://issues.apache.org/jira/browse/MAHOUT-1696
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.10.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
 Fix For: 0.10.1, 0.11.0


 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is 
 returning a Matrix of type B when it should be returning a matrix of type A.  
 This can lead to Sparse Matrices which should be Dense and vice versa.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types

2015-04-24 Thread Andrew Palumbo (JIRA)
Andrew Palumbo created MAHOUT-1696:
--

 Summary: QRDecomposition.solve(...) can return incorrect Matrix 
types
 Key: MAHOUT-1696
 URL: https://issues.apache.org/jira/browse/MAHOUT-1696
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.10.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
 Fix For: 0.10.1, 0.11.0


in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is returning 
a Matrix of type B when it should be returning a matrix of type A.  This can 
lead to Sparse Matrices which should be Dense and vice versa.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Streaming and incremental cooccurrence

2015-04-24 Thread Pat Ferrel
Ok, seems right.

So now to data structures. The input frequency vectors need to be paired with 
each input interaction type and would be nice to have as something that can be 
copied very fast as they get updated. Random access would also be nice but 
iteration is not needed. Over time they will get larger as all items get 
interactions, users will get more actions and appear in more vectors (with 
multi-intereaction data). Seems like hashmaps?

The cooccurrence matrix is more of a question to me. It needs to be updatable 
at the row and column level, and random access for both row and column would be 
nice. It needs to be expandable. To keep it small the keys should be integers, 
not full blown ID strings. There will have to be one matrix per interaction 
type. It should be simple to update the Search Engine to either mirror the 
matrix of use it directly for index updates. Each indicator update should cause 
an index update.

Putting aside speed and size issues this sounds like a NoSQL DB table that is 
cached in-memeory. 

On Apr 23, 2015, at 3:04 PM, Ted Dunning ted.dunn...@gmail.com wrote:

On Thu, Apr 23, 2015 at 8:53 AM, Pat Ferrel p...@occamsmachete.com wrote:

 This seems to violate the random choice of interactions to cut but now
 that I think about it does a random choice really matter?
 

It hasn't ever mattered such that I could see.  There is also some reason
to claim that earliest is best if items are very focussed in time.  Of
course, the opposite argument also applies.  That leaves us with empiricism
where the results are not definitive.

So I don't think that it matters, but I don't think that it does.



Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10

2015-04-24 Thread Andrew Musselman
Might be good to open up a Slack channel for mahout-bigtop; I made one at
http://mahout.slack.com which any u...@apache.org can log into.

On Friday, April 24, 2015, Andrew Musselman andrew.mussel...@gmail.com
wrote:

 I'm not educated enough in what has to happen but we're happy to help.

 Are there things we need to do from the Mahout end or is it changing
 recipes and doing regressions of BigTop builds, etc., what else?

 On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org
 javascript:_e(%7B%7D,'cvml','c...@apache.org'); wrote:

 I am trying to see if anyone is doing the accomodation of 0.10 into
 coming 1.0
 release. That's pretty much a release blocker at this point. I am not very
 much concerned about Spark compat, but if we to take 0.10 into 1.0 it
 needs to
 work and be tested against 2.6.0 Hadoop.

 So, does anyone works on the patch or this JIRA?

 Cos

 On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote:
  The spark 1.3 compat is in a near future release; what do you need from
 us
  to make 1.1 and 1.2 compat work?
 
  On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org
 
  wrote:
 
  
   [
  
 https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075
   ]
  
   Konstantin Boudnik commented on BIGTOP-1831:
   
  
   How is it going guys? Looks like this is one of the blockers for 1.0
 as we
   can not use old 0.9 version. Appreciate the help! Thank you!
  
Upgrade Mahout to 0.10
--
   
Key: BIGTOP-1831
URL:
 https://issues.apache.org/jira/browse/BIGTOP-1831
Project: Bigtop
 Issue Type: Task
 Components: general
   Affects Versions: 0.8.0
   Reporter: David Starina
   Priority: Blocker
 Labels: Mahout
Fix For: 1.0.0
   
   
Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x
   compatible release)
  
  
  
   --
   This message was sent by Atlassian JIRA
   (v6.3.4#6332)
  




Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10

2015-04-24 Thread Andrew Musselman
Yeah which we tested and which worked fine.

On Friday, April 24, 2015, Dmitriy Lyubimov dlie...@gmail.com wrote:

 my guess is this mostly means that mahout-mr has to run on 2.6.0, because
 spark part would basically run anywhere.

 but mr... gosh.

 On Fri, Apr 24, 2015 at 9:26 PM, Andrew Musselman 
 andrew.mussel...@gmail.com javascript:; wrote:

  I'm not educated enough in what has to happen but we're happy to help.
 
  Are there things we need to do from the Mahout end or is it changing
  recipes and doing regressions of BigTop builds, etc., what else?
 
  On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org
 javascript:; wrote:
 
   I am trying to see if anyone is doing the accomodation of 0.10 into
  coming
   1.0
   release. That's pretty much a release blocker at this point. I am not
  very
   much concerned about Spark compat, but if we to take 0.10 into 1.0 it
   needs to
   work and be tested against 2.6.0 Hadoop.
  
   So, does anyone works on the patch or this JIRA?
  
   Cos
  
   On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote:
The spark 1.3 compat is in a near future release; what do you need
 from
   us
to make 1.1 and 1.2 compat work?
   
On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) 
  j...@apache.org javascript:;
   javascript:;
wrote:
   

 [

  
 
 https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075
 ]

 Konstantin Boudnik commented on BIGTOP-1831:
 

 How is it going guys? Looks like this is one of the blockers for
 1.0
   as we
 can not use old 0.9 version. Appreciate the help! Thank you!

  Upgrade Mahout to 0.10
  --
 
  Key: BIGTOP-1831
  URL:
   https://issues.apache.org/jira/browse/BIGTOP-1831
  Project: Bigtop
   Issue Type: Task
   Components: general
 Affects Versions: 0.8.0
 Reporter: David Starina
 Priority: Blocker
   Labels: Mahout
  Fix For: 1.0.0
 
 
  Need to upgrade Mahout to the latest 0.10 release (first Hadoop
 2.x
 compatible release)



 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)

  
 



Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10

2015-04-24 Thread Konstantin Boudnik
I am trying to see if anyone is doing the accomodation of 0.10 into coming 1.0
release. That's pretty much a release blocker at this point. I am not very
much concerned about Spark compat, but if we to take 0.10 into 1.0 it needs to
work and be tested against 2.6.0 Hadoop.

So, does anyone works on the patch or this JIRA?

Cos

On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote:
 The spark 1.3 compat is in a near future release; what do you need from us
 to make 1.1 and 1.2 compat work?
 
 On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org
 wrote:
 
 
  [
  https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075
  ]
 
  Konstantin Boudnik commented on BIGTOP-1831:
  
 
  How is it going guys? Looks like this is one of the blockers for 1.0 as we
  can not use old 0.9 version. Appreciate the help! Thank you!
 
   Upgrade Mahout to 0.10
   --
  
   Key: BIGTOP-1831
   URL: https://issues.apache.org/jira/browse/BIGTOP-1831
   Project: Bigtop
Issue Type: Task
Components: general
  Affects Versions: 0.8.0
  Reporter: David Starina
  Priority: Blocker
Labels: Mahout
   Fix For: 1.0.0
  
  
   Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x
  compatible release)
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.3.4#6332)
 


signature.asc
Description: Digital signature


Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10

2015-04-24 Thread Andrew Musselman
I'm not educated enough in what has to happen but we're happy to help.

Are there things we need to do from the Mahout end or is it changing
recipes and doing regressions of BigTop builds, etc., what else?

On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org wrote:

 I am trying to see if anyone is doing the accomodation of 0.10 into coming
 1.0
 release. That's pretty much a release blocker at this point. I am not very
 much concerned about Spark compat, but if we to take 0.10 into 1.0 it
 needs to
 work and be tested against 2.6.0 Hadoop.

 So, does anyone works on the patch or this JIRA?

 Cos

 On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote:
  The spark 1.3 compat is in a near future release; what do you need from
 us
  to make 1.1 and 1.2 compat work?
 
  On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org
 javascript:;
  wrote:
 
  
   [
  
 https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075
   ]
  
   Konstantin Boudnik commented on BIGTOP-1831:
   
  
   How is it going guys? Looks like this is one of the blockers for 1.0
 as we
   can not use old 0.9 version. Appreciate the help! Thank you!
  
Upgrade Mahout to 0.10
--
   
Key: BIGTOP-1831
URL:
 https://issues.apache.org/jira/browse/BIGTOP-1831
Project: Bigtop
 Issue Type: Task
 Components: general
   Affects Versions: 0.8.0
   Reporter: David Starina
   Priority: Blocker
 Labels: Mahout
Fix For: 1.0.0
   
   
Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x
   compatible release)
  
  
  
   --
   This message was sent by Atlassian JIRA
   (v6.3.4#6332)
  



Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10

2015-04-24 Thread Dmitriy Lyubimov
my guess is this mostly means that mahout-mr has to run on 2.6.0, because
spark part would basically run anywhere.

but mr... gosh.

On Fri, Apr 24, 2015 at 9:26 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

 I'm not educated enough in what has to happen but we're happy to help.

 Are there things we need to do from the Mahout end or is it changing
 recipes and doing regressions of BigTop builds, etc., what else?

 On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org wrote:

  I am trying to see if anyone is doing the accomodation of 0.10 into
 coming
  1.0
  release. That's pretty much a release blocker at this point. I am not
 very
  much concerned about Spark compat, but if we to take 0.10 into 1.0 it
  needs to
  work and be tested against 2.6.0 Hadoop.
 
  So, does anyone works on the patch or this JIRA?
 
  Cos
 
  On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote:
   The spark 1.3 compat is in a near future release; what do you need from
  us
   to make 1.1 and 1.2 compat work?
  
   On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) 
 j...@apache.org
  javascript:;
   wrote:
  
   
[
   
 
 https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075
]
   
Konstantin Boudnik commented on BIGTOP-1831:

   
How is it going guys? Looks like this is one of the blockers for 1.0
  as we
can not use old 0.9 version. Appreciate the help! Thank you!
   
 Upgrade Mahout to 0.10
 --

 Key: BIGTOP-1831
 URL:
  https://issues.apache.org/jira/browse/BIGTOP-1831
 Project: Bigtop
  Issue Type: Task
  Components: general
Affects Versions: 0.8.0
Reporter: David Starina
Priority: Blocker
  Labels: Mahout
 Fix For: 1.0.0


 Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x
compatible release)
   
   
   
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
   
 



AtA error

2015-04-24 Thread Pat Ferrel
Running on Yarn Getting an error with AtA. A user is running on those 1887 
small ~4k Spark streaming files. The drms seem to be created properly. There 
may be empty rows in A—I’m having the user try with only AtA, no AtB and so no 
empty rows.

Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the two 
larger files from epinions, and it works fine

The task dies with
Job aborted due to stage failure: Exception while getting task result: 
java.util.NoSuchElementException: key not found: 20070
The stack trace is:

org.apache.spark.rdd.RDD.collect(RDD.scala:774)
org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)
org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)
org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
scala.collection.AbstractIterator.to(Iterator.scala:1157)
scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
scala.collection.AbstractIterator.toList(Iterator.scala:1157)



Re: AtA error

2015-04-24 Thread Dmitriy Lyubimov
in slim, it is almost certainly has to do with incorrect vector length
coming in.

i have written validate procedure for these things.

On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel p...@occamsmachete.com wrote:

 Running on Yarn Getting an error with AtA. A user is running on those 1887
 small ~4k Spark streaming files. The drms seem to be created properly.
 There may be empty rows in A—I’m having the user try with only AtA, no AtB
 and so no empty rows.

 Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the
 two larger files from epinions, and it works fine

 The task dies with
 Job aborted due to stage failure: Exception while getting task result:
 java.util.NoSuchElementException: key not found: 20070
 The stack trace is:

 org.apache.spark.rdd.RDD.collect(RDD.scala:774)
 org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
 org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
 org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
 org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)

 org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)

 org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
 org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)

 org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)

 org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
 scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
 scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 scala.collection.AbstractIterator.to(Iterator.scala:1157)
 scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
 scala.collection.AbstractIterator.toList(Iterator.scala:1157)




Re: AtA error

2015-04-24 Thread Pat Ferrel
When I concatenate the input into a single file per A, B etc it runs fine.

Do you think I’m reading incorrectly somehow messing up vector sizes? Should I 
go through the input matrix and force vector (row?) sizes to be correct?


On Apr 24, 2015, at 10:46 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:

in slim, it is almost certainly has to do with incorrect vector length
coming in.

i have written validate procedure for these things.

On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel p...@occamsmachete.com wrote:

 Running on Yarn Getting an error with AtA. A user is running on those 1887
 small ~4k Spark streaming files. The drms seem to be created properly.
 There may be empty rows in A—I’m having the user try with only AtA, no AtB
 and so no empty rows.
 
 Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the
 two larger files from epinions, and it works fine
 
 The task dies with
 Job aborted due to stage failure: Exception while getting task result:
 java.util.NoSuchElementException: key not found: 20070
 The stack trace is:
 
 org.apache.spark.rdd.RDD.collect(RDD.scala:774)
 org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
 org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
 org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
 org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)
 
 org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)
 
 org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
 org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)
 
 org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)
 
 org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
 scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
 scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 scala.collection.AbstractIterator.to(Iterator.scala:1157)
 scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
 scala.collection.AbstractIterator.toList(Iterator.scala:1157)
 
 



Re: AtA error

2015-04-24 Thread Dmitriy Lyubimov
yes i thought that's what i said

On Fri, Apr 24, 2015 at 12:54 PM, Pat Ferrel p...@occamsmachete.com wrote:

 When I concatenate the input into a single file per A, B etc it runs fine.

 Do you think I’m reading incorrectly somehow messing up vector sizes?
 Should I go through the input matrix and force vector (row?) sizes to be
 correct?


 On Apr 24, 2015, at 10:46 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:

 in slim, it is almost certainly has to do with incorrect vector length
 coming in.

 i have written validate procedure for these things.

 On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel p...@occamsmachete.com wrote:

  Running on Yarn Getting an error with AtA. A user is running on those
 1887
  small ~4k Spark streaming files. The drms seem to be created properly.
  There may be empty rows in A—I’m having the user try with only AtA, no
 AtB
  and so no empty rows.
 
  Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the
  two larger files from epinions, and it works fine
 
  The task dies with
  Job aborted due to stage failure: Exception while getting task result:
  java.util.NoSuchElementException: key not found: 20070
  The stack trace is:
 
  org.apache.spark.rdd.RDD.collect(RDD.scala:774)
  org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
  org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
 
 org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
 
 org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)
 
 
 org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)
 
 
 org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
  org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)
 
 
 org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)
 
 
 org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
  scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
  scala.collection.Iterator$class.foreach(Iterator.scala:727)
  scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
  scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
  scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
  scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
  scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
  scala.collection.AbstractIterator.to(Iterator.scala:1157)
  scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
  scala.collection.AbstractIterator.toList(Iterator.scala:1157)