[jira] [Comment Edited] (SPARK-12494) Array out of bound Exception in KMeans Yarn Mode

2015-12-28 Thread Anandraj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073246#comment-15073246
 ] 

Anandraj edited comment on SPARK-12494 at 12/28/15 11:11 PM:
-

I couldn't reply over the Christmas break. Please find the sample data 
attached. 

vectors1.tar.gz -> Sample data for reproducing the K-Means error in yarn 
cluster mode. Program works in local mode but fails in yarn cluster mode. 


was (Author: anandr...@gmail.com):
Sample data for reproducing the K-Means error in yarn cluster mode. Program 
works in local mode but fails in yarn cluster mode. 

> Array out of bound Exception in KMeans Yarn Mode
> 
>
> Key: SPARK-12494
> URL: https://issues.apache.org/jira/browse/SPARK-12494
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Anandraj
> Attachments: vectors1.tar.gz
>
>
> Hi,
> I am try to run k-means clustering on the word2vec data. I tested the code in 
> local mode with small data. Clustering completes fine. But, when I run with 
> same data on Yarn Cluster mode, it fails below error. 
> 15/12/23 00:49:01 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:126)
>   at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377)
>   at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377)
>   at scala.Array$.tabulate(Array.scala:331)
>   at 
> org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:377)
>   at 
> org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:249)
>   at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:213)
>   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:520)
>   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:531)
>   at 
> com.tempurer.intelligence.adhocjobs.spark.kMeans$delayedInit$body.apply(kMeans.scala:24)
>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>   at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>   at scala.App$class.main(App.scala:71)
>   at 
> com.tempurer.intelligence.adhocjobs.spark.kMeans$.main(kMeans.scala:9)
>   at com.tempurer.intelligence.adhocjobs.spark.kMeans.main(kMeans.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)
> 15/12/23 00:49:01 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: 
> java.lang.ArrayIndexOutOfBoundsException: 0)
> In Local mode with large data(2375849 vectors of size 200) , the first 
> sampling stage completes. Second stage suspends execution without any error 
> message. No Active execution in progress. I could only see the below warning 
> message
> 15/12/23 01:24:13 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 
> 37) in 29 ms on localhost (4/34)
> 15/12/23 01:24:14 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:14 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 2 total executors!
> 15/12/23 01:24:15 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:15 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 3 total executors!
> 15/12/23 01:24:16 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:16 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 4 total executors!
> 15/12/23 01:24:17 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:17 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 5 total executors!
> 15/12/23 01:24:18 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:18 WARN ExecutorAllocationManager: Unable to reach the cluster 

[jira] [Comment Edited] (SPARK-12494) Array out of bound Exception in KMeans Yarn Mode

2015-12-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070840#comment-15070840
 ] 

Yanbo Liang edited comment on SPARK-12494 at 12/24/15 10:15 AM:


[~anandr...@gmail.com]
Can this issue be reproduced? It looks like it's caused by 
{code}
val sample = data.takeSample(true, runs, seed).toSeq
{code}
return null array. Could you provide the smallest dataset that can help us to 
reproduce it?


was (Author: yanboliang):
[~anandr...@gmail.com]
Can this issue be reproduced? It looks like it's caused by 
{code}
val sample = data.takeSample(true, runs, seed).toSeq
{code}
return null array, but I this we should reproduce this issue firstly.
Could you provide the smallest dataset that can reproduce it?

> Array out of bound Exception in KMeans Yarn Mode
> 
>
> Key: SPARK-12494
> URL: https://issues.apache.org/jira/browse/SPARK-12494
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Anandraj
>Priority: Blocker
>
> Hi,
> I am try to run k-means clustering on the word2vec data. I tested the code in 
> local mode with small data. Clustering completes fine. But, when I run with 
> same data on Yarn Cluster mode, it fails below error. 
> 15/12/23 00:49:01 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:126)
>   at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377)
>   at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377)
>   at scala.Array$.tabulate(Array.scala:331)
>   at 
> org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:377)
>   at 
> org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:249)
>   at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:213)
>   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:520)
>   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:531)
>   at 
> com.tempurer.intelligence.adhocjobs.spark.kMeans$delayedInit$body.apply(kMeans.scala:24)
>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>   at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>   at scala.App$class.main(App.scala:71)
>   at 
> com.tempurer.intelligence.adhocjobs.spark.kMeans$.main(kMeans.scala:9)
>   at com.tempurer.intelligence.adhocjobs.spark.kMeans.main(kMeans.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)
> 15/12/23 00:49:01 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: 
> java.lang.ArrayIndexOutOfBoundsException: 0)
> In Local mode with large data(2375849 vectors of size 200) , the first 
> sampling stage completes. Second stage suspends execution without any error 
> message. No Active execution in progress. I could only see the below warning 
> message
> 15/12/23 01:24:13 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 
> 37) in 29 ms on localhost (4/34)
> 15/12/23 01:24:14 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:14 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 2 total executors!
> 15/12/23 01:24:15 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:15 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 3 total executors!
> 15/12/23 01:24:16 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:16 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 4 total executors!
> 15/12/23 01:24:17 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:17 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 5 total executors!
> 15/12/23 01:24:18 WARN 

[jira] [Comment Edited] (SPARK-12494) Array out of bound Exception in KMeans Yarn Mode

2015-12-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070840#comment-15070840
 ] 

Yanbo Liang edited comment on SPARK-12494 at 12/24/15 10:14 AM:


[~anandr...@gmail.com]
Can this issue be reproduced? It looks like it's caused by 
{code}
val sample = data.takeSample(true, runs, seed).toSeq
{code}
return null array, but I this we should reproduce this issue firstly.
Could you provide the smallest dataset that can reproduce it?


was (Author: yanboliang):
[~anandr...@gmail.com]
Can this issue be reproduced? It looks like it's caused by 
{code}
val sample = data.takeSample(true, runs, seed).toSeq
{code}
return null array, but I this we should reproduce this issue firstly.

> Array out of bound Exception in KMeans Yarn Mode
> 
>
> Key: SPARK-12494
> URL: https://issues.apache.org/jira/browse/SPARK-12494
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.5.0
>Reporter: Anandraj
>Priority: Blocker
>
> Hi,
> I am try to run k-means clustering on the word2vec data. I tested the code in 
> local mode with small data. Clustering completes fine. But, when I run with 
> same data on Yarn Cluster mode, it fails below error. 
> 15/12/23 00:49:01 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:126)
>   at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377)
>   at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377)
>   at scala.Array$.tabulate(Array.scala:331)
>   at 
> org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:377)
>   at 
> org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:249)
>   at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:213)
>   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:520)
>   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:531)
>   at 
> com.tempurer.intelligence.adhocjobs.spark.kMeans$delayedInit$body.apply(kMeans.scala:24)
>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>   at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>   at scala.App$class.main(App.scala:71)
>   at 
> com.tempurer.intelligence.adhocjobs.spark.kMeans$.main(kMeans.scala:9)
>   at com.tempurer.intelligence.adhocjobs.spark.kMeans.main(kMeans.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)
> 15/12/23 00:49:01 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: 
> java.lang.ArrayIndexOutOfBoundsException: 0)
> In Local mode with large data(2375849 vectors of size 200) , the first 
> sampling stage completes. Second stage suspends execution without any error 
> message. No Active execution in progress. I could only see the below warning 
> message
> 15/12/23 01:24:13 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 
> 37) in 29 ms on localhost (4/34)
> 15/12/23 01:24:14 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:14 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 2 total executors!
> 15/12/23 01:24:15 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:15 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 3 total executors!
> 15/12/23 01:24:16 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:16 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 4 total executors!
> 15/12/23 01:24:17 WARN SparkContext: Requesting executors is only supported 
> in coarse-grained mode
> 15/12/23 01:24:17 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 5 total executors!
> 15/12/23 01:24:18 WARN SparkContext: Requesting