Jenkins build is back to normal : Mahout-Examples-Cluster-Reuters-II #1168
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/1168/
[jira] [Commented] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types
[ https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511938#comment-14511938 ] ASF GitHub Bot commented on MAHOUT-1696: GitHub user andrewpalumbo opened a pull request: https://github.com/apache/mahout/pull/124 MAHOUT-1696: QRDecomposition.solve(...) can return incorrect Matrix types You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewpalumbo/mahout qrFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #124 commit 69a1e134c3a46bbbff08c9f68c057737fee364d3 Author: Andrew Palumbo apalu...@apache.org Date: 2015-04-24T22:38:06Z Store a small matrix QRDecomposition.solve(...) can return incorrect Matrix types Key: MAHOUT-1696 URL: https://issues.apache.org/jira/browse/MAHOUT-1696 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.10.0 Reporter: Andrew Palumbo Assignee: Andrew Palumbo Fix For: 0.10.1, 0.11.0 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is returning a Matrix of type B when it should be returning a matrix of type A. This can lead to Sparse Matrices which should be Dense and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types
[ https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512074#comment-14512074 ] Hudson commented on MAHOUT-1696: SUCCESS: Integrated in Mahout-Quality #3138 (See [https://builds.apache.org/job/Mahout-Quality/3138/]) MAHOUT-1696: QRDecomposition.solve(...) can return incorrect Matrix types. closes apache/mahout#124 (apalumbo: rev 1f9188d0789640a6514e1120bf4b44061a886165) * math/src/main/java/org/apache/mahout/math/QRDecomposition.java * CHANGELOG QRDecomposition.solve(...) can return incorrect Matrix types Key: MAHOUT-1696 URL: https://issues.apache.org/jira/browse/MAHOUT-1696 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.10.0 Reporter: Andrew Palumbo Assignee: Andrew Palumbo Fix For: 0.10.1, 0.11.0 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is returning a Matrix of type B when it should be returning a matrix of type A. This can lead to Sparse Matrices which should be Dense and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10
The spark 1.3 compat is in a near future release; what do you need from us to make 1.1 and 1.2 compat work? On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075 ] Konstantin Boudnik commented on BIGTOP-1831: How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you! Upgrade Mahout to 0.10 -- Key: BIGTOP-1831 URL: https://issues.apache.org/jira/browse/BIGTOP-1831 Project: Bigtop Issue Type: Task Components: general Affects Versions: 0.8.0 Reporter: David Starina Priority: Blocker Labels: Mahout Fix For: 1.0.0 Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x compatible release) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types
[ https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512022#comment-14512022 ] ASF GitHub Bot commented on MAHOUT-1696: Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/124 QRDecomposition.solve(...) can return incorrect Matrix types Key: MAHOUT-1696 URL: https://issues.apache.org/jira/browse/MAHOUT-1696 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.10.0 Reporter: Andrew Palumbo Assignee: Andrew Palumbo Fix For: 0.10.1, 0.11.0 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is returning a Matrix of type B when it should be returning a matrix of type A. This can lead to Sparse Matrices which should be Dense and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Streaming and incremental cooccurrence
Sounds about right. My guess is that memory is now large enough, especially on a cluster that the cooccurrence will fit into memory quite often. Taking a large example of 10 million items and 10,000 cooccurrences each, there will be 100 billion cooccurrences to store which shouldn't take more than about half a TB of data if fully populated. This isn't that outrageous any more. With SSD's as backing store, even 100GB of RAM or less might well produce very nice results. Depending on incoming transaction rates, using spinning disk as a backing store might also work with small memory. Experiments are in order. On Fri, Apr 24, 2015 at 8:12 AM, Pat Ferrel p...@occamsmachete.com wrote: Ok, seems right. So now to data structures. The input frequency vectors need to be paired with each input interaction type and would be nice to have as something that can be copied very fast as they get updated. Random access would also be nice but iteration is not needed. Over time they will get larger as all items get interactions, users will get more actions and appear in more vectors (with multi-intereaction data). Seems like hashmaps? The cooccurrence matrix is more of a question to me. It needs to be updatable at the row and column level, and random access for both row and column would be nice. It needs to be expandable. To keep it small the keys should be integers, not full blown ID strings. There will have to be one matrix per interaction type. It should be simple to update the Search Engine to either mirror the matrix of use it directly for index updates. Each indicator update should cause an index update. Putting aside speed and size issues this sounds like a NoSQL DB table that is cached in-memeory. On Apr 23, 2015, at 3:04 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Apr 23, 2015 at 8:53 AM, Pat Ferrel p...@occamsmachete.com wrote: This seems to violate the random choice of interactions to cut but now that I think about it does a random choice really matter? It hasn't ever mattered such that I could see. There is also some reason to claim that earliest is best if items are very focussed in time. Of course, the opposite argument also applies. That leaves us with empiricism where the results are not definitive. So I don't think that it matters, but I don't think that it does.
[jira] [Resolved] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types
[ https://issues.apache.org/jira/browse/MAHOUT-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo resolved MAHOUT-1696. Resolution: Fixed pushed to master and 0.10.x branch QRDecomposition.solve(...) can return incorrect Matrix types Key: MAHOUT-1696 URL: https://issues.apache.org/jira/browse/MAHOUT-1696 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.10.0 Reporter: Andrew Palumbo Assignee: Andrew Palumbo Fix For: 0.10.1, 0.11.0 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is returning a Matrix of type B when it should be returning a matrix of type A. This can lead to Sparse Matrices which should be Dense and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAHOUT-1696) QRDecomposition.solve(...) can return incorrect Matrix types
Andrew Palumbo created MAHOUT-1696: -- Summary: QRDecomposition.solve(...) can return incorrect Matrix types Key: MAHOUT-1696 URL: https://issues.apache.org/jira/browse/MAHOUT-1696 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.10.0 Reporter: Andrew Palumbo Assignee: Andrew Palumbo Fix For: 0.10.1, 0.11.0 in QRDecomposition.java, QRDecomposition(Matrix A).solve(Matrix B) is returning a Matrix of type B when it should be returning a matrix of type A. This can lead to Sparse Matrices which should be Dense and vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Streaming and incremental cooccurrence
Ok, seems right. So now to data structures. The input frequency vectors need to be paired with each input interaction type and would be nice to have as something that can be copied very fast as they get updated. Random access would also be nice but iteration is not needed. Over time they will get larger as all items get interactions, users will get more actions and appear in more vectors (with multi-intereaction data). Seems like hashmaps? The cooccurrence matrix is more of a question to me. It needs to be updatable at the row and column level, and random access for both row and column would be nice. It needs to be expandable. To keep it small the keys should be integers, not full blown ID strings. There will have to be one matrix per interaction type. It should be simple to update the Search Engine to either mirror the matrix of use it directly for index updates. Each indicator update should cause an index update. Putting aside speed and size issues this sounds like a NoSQL DB table that is cached in-memeory. On Apr 23, 2015, at 3:04 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Apr 23, 2015 at 8:53 AM, Pat Ferrel p...@occamsmachete.com wrote: This seems to violate the random choice of interactions to cut but now that I think about it does a random choice really matter? It hasn't ever mattered such that I could see. There is also some reason to claim that earliest is best if items are very focussed in time. Of course, the opposite argument also applies. That leaves us with empiricism where the results are not definitive. So I don't think that it matters, but I don't think that it does.
Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10
Might be good to open up a Slack channel for mahout-bigtop; I made one at http://mahout.slack.com which any u...@apache.org can log into. On Friday, April 24, 2015, Andrew Musselman andrew.mussel...@gmail.com wrote: I'm not educated enough in what has to happen but we're happy to help. Are there things we need to do from the Mahout end or is it changing recipes and doing regressions of BigTop builds, etc., what else? On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org javascript:_e(%7B%7D,'cvml','c...@apache.org'); wrote: I am trying to see if anyone is doing the accomodation of 0.10 into coming 1.0 release. That's pretty much a release blocker at this point. I am not very much concerned about Spark compat, but if we to take 0.10 into 1.0 it needs to work and be tested against 2.6.0 Hadoop. So, does anyone works on the patch or this JIRA? Cos On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote: The spark 1.3 compat is in a near future release; what do you need from us to make 1.1 and 1.2 compat work? On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075 ] Konstantin Boudnik commented on BIGTOP-1831: How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you! Upgrade Mahout to 0.10 -- Key: BIGTOP-1831 URL: https://issues.apache.org/jira/browse/BIGTOP-1831 Project: Bigtop Issue Type: Task Components: general Affects Versions: 0.8.0 Reporter: David Starina Priority: Blocker Labels: Mahout Fix For: 1.0.0 Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x compatible release) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10
Yeah which we tested and which worked fine. On Friday, April 24, 2015, Dmitriy Lyubimov dlie...@gmail.com wrote: my guess is this mostly means that mahout-mr has to run on 2.6.0, because spark part would basically run anywhere. but mr... gosh. On Fri, Apr 24, 2015 at 9:26 PM, Andrew Musselman andrew.mussel...@gmail.com javascript:; wrote: I'm not educated enough in what has to happen but we're happy to help. Are there things we need to do from the Mahout end or is it changing recipes and doing regressions of BigTop builds, etc., what else? On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org javascript:; wrote: I am trying to see if anyone is doing the accomodation of 0.10 into coming 1.0 release. That's pretty much a release blocker at this point. I am not very much concerned about Spark compat, but if we to take 0.10 into 1.0 it needs to work and be tested against 2.6.0 Hadoop. So, does anyone works on the patch or this JIRA? Cos On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote: The spark 1.3 compat is in a near future release; what do you need from us to make 1.1 and 1.2 compat work? On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org javascript:; javascript:; wrote: [ https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075 ] Konstantin Boudnik commented on BIGTOP-1831: How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you! Upgrade Mahout to 0.10 -- Key: BIGTOP-1831 URL: https://issues.apache.org/jira/browse/BIGTOP-1831 Project: Bigtop Issue Type: Task Components: general Affects Versions: 0.8.0 Reporter: David Starina Priority: Blocker Labels: Mahout Fix For: 1.0.0 Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x compatible release) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10
I am trying to see if anyone is doing the accomodation of 0.10 into coming 1.0 release. That's pretty much a release blocker at this point. I am not very much concerned about Spark compat, but if we to take 0.10 into 1.0 it needs to work and be tested against 2.6.0 Hadoop. So, does anyone works on the patch or this JIRA? Cos On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote: The spark 1.3 compat is in a near future release; what do you need from us to make 1.1 and 1.2 compat work? On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075 ] Konstantin Boudnik commented on BIGTOP-1831: How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you! Upgrade Mahout to 0.10 -- Key: BIGTOP-1831 URL: https://issues.apache.org/jira/browse/BIGTOP-1831 Project: Bigtop Issue Type: Task Components: general Affects Versions: 0.8.0 Reporter: David Starina Priority: Blocker Labels: Mahout Fix For: 1.0.0 Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x compatible release) -- This message was sent by Atlassian JIRA (v6.3.4#6332) signature.asc Description: Digital signature
Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10
I'm not educated enough in what has to happen but we're happy to help. Are there things we need to do from the Mahout end or is it changing recipes and doing regressions of BigTop builds, etc., what else? On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org wrote: I am trying to see if anyone is doing the accomodation of 0.10 into coming 1.0 release. That's pretty much a release blocker at this point. I am not very much concerned about Spark compat, but if we to take 0.10 into 1.0 it needs to work and be tested against 2.6.0 Hadoop. So, does anyone works on the patch or this JIRA? Cos On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote: The spark 1.3 compat is in a near future release; what do you need from us to make 1.1 and 1.2 compat work? On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org javascript:; wrote: [ https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075 ] Konstantin Boudnik commented on BIGTOP-1831: How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you! Upgrade Mahout to 0.10 -- Key: BIGTOP-1831 URL: https://issues.apache.org/jira/browse/BIGTOP-1831 Project: Bigtop Issue Type: Task Components: general Affects Versions: 0.8.0 Reporter: David Starina Priority: Blocker Labels: Mahout Fix For: 1.0.0 Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x compatible release) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Created] (BIGTOP-1831) Upgrade Mahout to 0.10
my guess is this mostly means that mahout-mr has to run on 2.6.0, because spark part would basically run anywhere. but mr... gosh. On Fri, Apr 24, 2015 at 9:26 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: I'm not educated enough in what has to happen but we're happy to help. Are there things we need to do from the Mahout end or is it changing recipes and doing regressions of BigTop builds, etc., what else? On Friday, April 24, 2015, Konstantin Boudnik c...@apache.org wrote: I am trying to see if anyone is doing the accomodation of 0.10 into coming 1.0 release. That's pretty much a release blocker at this point. I am not very much concerned about Spark compat, but if we to take 0.10 into 1.0 it needs to work and be tested against 2.6.0 Hadoop. So, does anyone works on the patch or this JIRA? Cos On Fri, Apr 24, 2015 at 05:48PM, Andrew Musselman wrote: The spark 1.3 compat is in a near future release; what do you need from us to make 1.1 and 1.2 compat work? On Thursday, April 23, 2015, Konstantin Boudnik (JIRA) j...@apache.org javascript:; wrote: [ https://issues.apache.org/jira/browse/BIGTOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510075#comment-14510075 ] Konstantin Boudnik commented on BIGTOP-1831: How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you! Upgrade Mahout to 0.10 -- Key: BIGTOP-1831 URL: https://issues.apache.org/jira/browse/BIGTOP-1831 Project: Bigtop Issue Type: Task Components: general Affects Versions: 0.8.0 Reporter: David Starina Priority: Blocker Labels: Mahout Fix For: 1.0.0 Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x compatible release) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
AtA error
Running on Yarn Getting an error with AtA. A user is running on those 1887 small ~4k Spark streaming files. The drms seem to be created properly. There may be empty rows in A—I’m having the user try with only AtA, no AtB and so no empty rows. Any ideas? This is only 7.5M of data. I’ve tried a similar calc with the two larger files from epinions, and it works fine The task dies with Job aborted due to stage failure: Exception while getting task result: java.util.NoSuchElementException: key not found: 20070 The stack trace is: org.apache.spark.rdd.RDD.collect(RDD.scala:774) org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121) org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242) org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108) org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40) org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) scala.collection.AbstractIterator.to(Iterator.scala:1157) scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257) scala.collection.AbstractIterator.toList(Iterator.scala:1157)
Re: AtA error
in slim, it is almost certainly has to do with incorrect vector length coming in. i have written validate procedure for these things. On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel p...@occamsmachete.com wrote: Running on Yarn Getting an error with AtA. A user is running on those 1887 small ~4k Spark streaming files. The drms seem to be created properly. There may be empty rows in A—I’m having the user try with only AtA, no AtB and so no empty rows. Any ideas? This is only 7.5M of data. I’ve tried a similar calc with the two larger files from epinions, and it works fine The task dies with Job aborted due to stage failure: Exception while getting task result: java.util.NoSuchElementException: key not found: 20070 The stack trace is: org.apache.spark.rdd.RDD.collect(RDD.scala:774) org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121) org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242) org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108) org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40) org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) scala.collection.AbstractIterator.to(Iterator.scala:1157) scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257) scala.collection.AbstractIterator.toList(Iterator.scala:1157)
Re: AtA error
When I concatenate the input into a single file per A, B etc it runs fine. Do you think I’m reading incorrectly somehow messing up vector sizes? Should I go through the input matrix and force vector (row?) sizes to be correct? On Apr 24, 2015, at 10:46 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: in slim, it is almost certainly has to do with incorrect vector length coming in. i have written validate procedure for these things. On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel p...@occamsmachete.com wrote: Running on Yarn Getting an error with AtA. A user is running on those 1887 small ~4k Spark streaming files. The drms seem to be created properly. There may be empty rows in A—I’m having the user try with only AtA, no AtB and so no empty rows. Any ideas? This is only 7.5M of data. I’ve tried a similar calc with the two larger files from epinions, and it works fine The task dies with Job aborted due to stage failure: Exception while getting task result: java.util.NoSuchElementException: key not found: 20070 The stack trace is: org.apache.spark.rdd.RDD.collect(RDD.scala:774) org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121) org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242) org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108) org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40) org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) scala.collection.AbstractIterator.to(Iterator.scala:1157) scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257) scala.collection.AbstractIterator.toList(Iterator.scala:1157)
Re: AtA error
yes i thought that's what i said On Fri, Apr 24, 2015 at 12:54 PM, Pat Ferrel p...@occamsmachete.com wrote: When I concatenate the input into a single file per A, B etc it runs fine. Do you think I’m reading incorrectly somehow messing up vector sizes? Should I go through the input matrix and force vector (row?) sizes to be correct? On Apr 24, 2015, at 10:46 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: in slim, it is almost certainly has to do with incorrect vector length coming in. i have written validate procedure for these things. On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel p...@occamsmachete.com wrote: Running on Yarn Getting an error with AtA. A user is running on those 1887 small ~4k Spark streaming files. The drms seem to be created properly. There may be empty rows in A—I’m having the user try with only AtA, no AtB and so no empty rows. Any ideas? This is only 7.5M of data. I’ve tried a similar calc with the two larger files from epinions, and it works fine The task dies with Job aborted due to stage failure: Exception while getting task result: java.util.NoSuchElementException: key not found: 20070 The stack trace is: org.apache.spark.rdd.RDD.collect(RDD.scala:774) org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121) org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231) org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242) org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108) org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40) org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129) org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176) scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) scala.collection.AbstractIterator.to(Iterator.scala:1157) scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257) scala.collection.AbstractIterator.toList(Iterator.scala:1157)