Re: MLLIb: Linear regression: Loss was due to java.lang.ArrayIndexOutOfBoundsException

2014-12-15 Thread Xiangrui Meng
Is it possible that after filtering the feature dimension changed?
This may happen if you use LIBSVM format but didn't specify the number
of features. -Xiangrui

On Tue, Dec 9, 2014 at 4:54 AM, Sameer Tilak ssti...@live.com wrote:
 Hi All,


 I was able to run LinearRegressionwithSGD for a largeer dataset ( 2GB
 sparse). I have now filtered the data and I am running regression on a
 subset of it  (~ 200 MB). I see this error, which is strange since it was
 running fine with the superset data. Is this a formatting issue (which I
 doubt) or is this some other issue in data preparation? I confirmed that
 there is no empty line in my dataset. Any help with this will be highly
 appreciated.


 14/12/08 20:32:03 WARN TaskSetManager: Lost TID 5 (task 3.0:1)

 14/12/08 20:32:03 WARN TaskSetManager: Loss was due to
 java.lang.ArrayIndexOutOfBoundsException

 java.lang.ArrayIndexOutOfBoundsException: 150323

 at
 breeze.linalg.operators.DenseVector_SparseVector_Ops$$anon$129.apply(SparseVectorOps.scala:231)

 at
 breeze.linalg.operators.DenseVector_SparseVector_Ops$$anon$129.apply(SparseVectorOps.scala:216)

 at breeze.linalg.operators.BinaryRegistry$class.apply(BinaryOp.scala:60)

 at breeze.linalg.VectorOps$$anon$178.apply(Vector.scala:391)

 at breeze.linalg.NumericOps$class.dot(NumericOps.scala:83)

 at breeze.linalg.DenseVector.dot(DenseVector.scala:47)

 at
 org.apache.spark.mllib.optimization.LeastSquaresGradient.compute(Gradient.scala:125)

 at
 org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:180)

 at
 org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:179)

 at
 scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)

 at
 scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)

 at scala.collection.Iterator$class.foreach(Iterator.scala:727)

 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

 at
 scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)

 at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)

 at
 scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)

 at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)

 at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)

 at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)

 at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)

 at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)

 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)

 at org.apache.spark.scheduler.Task.run(Task.scala:51)

 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)






-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



MLLIb: Linear regression: Loss was due to java.lang.ArrayIndexOutOfBoundsException

2014-12-08 Thread Sameer Tilak








Hi All,
I was able to run LinearRegressionwithSGD for a largeer dataset ( 2GB sparse). 
I have now filtered the data and I am running regression on a subset of it  (~ 
200 MB). I see this error, which is strange since it was running fine with the 
superset data. Is this a formatting issue (which I doubt) or is this some other 
issue in data preparation? I confirmed that there is no empty line in my 
dataset. Any help with this will be highly appreciated.


14/12/08 20:32:03 WARN TaskSetManager: Lost TID 5 (task 3.0:1)
14/12/08 20:32:03 WARN TaskSetManager: Loss was due to 
java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException: 150323
at 
breeze.linalg.operators.DenseVector_SparseVector_Ops$$anon$129.apply(SparseVectorOps.scala:231)
at 
breeze.linalg.operators.DenseVector_SparseVector_Ops$$anon$129.apply(SparseVectorOps.scala:216)
at breeze.linalg.operators.BinaryRegistry$class.apply(BinaryOp.scala:60)
at breeze.linalg.VectorOps$$anon$178.apply(Vector.scala:391)
at breeze.linalg.NumericOps$class.dot(NumericOps.scala:83)
at breeze.linalg.DenseVector.dot(DenseVector.scala:47)
at 
org.apache.spark.mllib.optimization.LeastSquaresGradient.compute(Gradient.scala:125)
at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:180)
at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:179)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
at 
org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
at 
org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)