[jira] [Comment Edited] (SPARK-12494) Array out of bound Exception in KMeans Yarn Mode
[ https://issues.apache.org/jira/browse/SPARK-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073246#comment-15073246 ] Anandraj edited comment on SPARK-12494 at 12/28/15 11:11 PM: - I couldn't reply over the Christmas break. Please find the sample data attached. vectors1.tar.gz -> Sample data for reproducing the K-Means error in yarn cluster mode. Program works in local mode but fails in yarn cluster mode. was (Author: anandr...@gmail.com): Sample data for reproducing the K-Means error in yarn cluster mode. Program works in local mode but fails in yarn cluster mode. > Array out of bound Exception in KMeans Yarn Mode > > > Key: SPARK-12494 > URL: https://issues.apache.org/jira/browse/SPARK-12494 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Anandraj > Attachments: vectors1.tar.gz > > > Hi, > I am try to run k-means clustering on the word2vec data. I tested the code in > local mode with small data. Clustering completes fine. But, when I run with > same data on Yarn Cluster mode, it fails below error. > 15/12/23 00:49:01 ERROR yarn.ApplicationMaster: User class threw exception: > java.lang.ArrayIndexOutOfBoundsException: 0 > java.lang.ArrayIndexOutOfBoundsException: 0 > at > scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:126) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377) > at scala.Array$.tabulate(Array.scala:331) > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:377) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:249) > at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:213) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:520) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:531) > at > com.tempurer.intelligence.adhocjobs.spark.kMeans$delayedInit$body.apply(kMeans.scala:24) > at scala.Function0$class.apply$mcV$sp(Function0.scala:40) > at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) > at scala.App$class.main(App.scala:71) > at > com.tempurer.intelligence.adhocjobs.spark.kMeans$.main(kMeans.scala:9) > at com.tempurer.intelligence.adhocjobs.spark.kMeans.main(kMeans.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525) > 15/12/23 00:49:01 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 15, (reason: User class threw exception: > java.lang.ArrayIndexOutOfBoundsException: 0) > In Local mode with large data(2375849 vectors of size 200) , the first > sampling stage completes. Second stage suspends execution without any error > message. No Active execution in progress. I could only see the below warning > message > 15/12/23 01:24:13 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID > 37) in 29 ms on localhost (4/34) > 15/12/23 01:24:14 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:14 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 2 total executors! > 15/12/23 01:24:15 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:15 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 3 total executors! > 15/12/23 01:24:16 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:16 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 4 total executors! > 15/12/23 01:24:17 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:17 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 5 total executors! > 15/12/23 01:24:18 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:18 WARN ExecutorAllocationManager: Unable to reach the cluster
[jira] [Comment Edited] (SPARK-12494) Array out of bound Exception in KMeans Yarn Mode
[ https://issues.apache.org/jira/browse/SPARK-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070840#comment-15070840 ] Yanbo Liang edited comment on SPARK-12494 at 12/24/15 10:15 AM: [~anandr...@gmail.com] Can this issue be reproduced? It looks like it's caused by {code} val sample = data.takeSample(true, runs, seed).toSeq {code} return null array. Could you provide the smallest dataset that can help us to reproduce it? was (Author: yanboliang): [~anandr...@gmail.com] Can this issue be reproduced? It looks like it's caused by {code} val sample = data.takeSample(true, runs, seed).toSeq {code} return null array, but I this we should reproduce this issue firstly. Could you provide the smallest dataset that can reproduce it? > Array out of bound Exception in KMeans Yarn Mode > > > Key: SPARK-12494 > URL: https://issues.apache.org/jira/browse/SPARK-12494 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Anandraj >Priority: Blocker > > Hi, > I am try to run k-means clustering on the word2vec data. I tested the code in > local mode with small data. Clustering completes fine. But, when I run with > same data on Yarn Cluster mode, it fails below error. > 15/12/23 00:49:01 ERROR yarn.ApplicationMaster: User class threw exception: > java.lang.ArrayIndexOutOfBoundsException: 0 > java.lang.ArrayIndexOutOfBoundsException: 0 > at > scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:126) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377) > at scala.Array$.tabulate(Array.scala:331) > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:377) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:249) > at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:213) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:520) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:531) > at > com.tempurer.intelligence.adhocjobs.spark.kMeans$delayedInit$body.apply(kMeans.scala:24) > at scala.Function0$class.apply$mcV$sp(Function0.scala:40) > at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) > at scala.App$class.main(App.scala:71) > at > com.tempurer.intelligence.adhocjobs.spark.kMeans$.main(kMeans.scala:9) > at com.tempurer.intelligence.adhocjobs.spark.kMeans.main(kMeans.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525) > 15/12/23 00:49:01 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 15, (reason: User class threw exception: > java.lang.ArrayIndexOutOfBoundsException: 0) > In Local mode with large data(2375849 vectors of size 200) , the first > sampling stage completes. Second stage suspends execution without any error > message. No Active execution in progress. I could only see the below warning > message > 15/12/23 01:24:13 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID > 37) in 29 ms on localhost (4/34) > 15/12/23 01:24:14 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:14 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 2 total executors! > 15/12/23 01:24:15 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:15 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 3 total executors! > 15/12/23 01:24:16 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:16 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 4 total executors! > 15/12/23 01:24:17 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:17 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 5 total executors! > 15/12/23 01:24:18 WARN
[jira] [Comment Edited] (SPARK-12494) Array out of bound Exception in KMeans Yarn Mode
[ https://issues.apache.org/jira/browse/SPARK-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070840#comment-15070840 ] Yanbo Liang edited comment on SPARK-12494 at 12/24/15 10:14 AM: [~anandr...@gmail.com] Can this issue be reproduced? It looks like it's caused by {code} val sample = data.takeSample(true, runs, seed).toSeq {code} return null array, but I this we should reproduce this issue firstly. Could you provide the smallest dataset that can reproduce it? was (Author: yanboliang): [~anandr...@gmail.com] Can this issue be reproduced? It looks like it's caused by {code} val sample = data.takeSample(true, runs, seed).toSeq {code} return null array, but I this we should reproduce this issue firstly. > Array out of bound Exception in KMeans Yarn Mode > > > Key: SPARK-12494 > URL: https://issues.apache.org/jira/browse/SPARK-12494 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.0 >Reporter: Anandraj >Priority: Blocker > > Hi, > I am try to run k-means clustering on the word2vec data. I tested the code in > local mode with small data. Clustering completes fine. But, when I run with > same data on Yarn Cluster mode, it fails below error. > 15/12/23 00:49:01 ERROR yarn.ApplicationMaster: User class threw exception: > java.lang.ArrayIndexOutOfBoundsException: 0 > java.lang.ArrayIndexOutOfBoundsException: 0 > at > scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:126) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377) > at > org.apache.spark.mllib.clustering.KMeans$$anonfun$19.apply(KMeans.scala:377) > at scala.Array$.tabulate(Array.scala:331) > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:377) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:249) > at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:213) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:520) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:531) > at > com.tempurer.intelligence.adhocjobs.spark.kMeans$delayedInit$body.apply(kMeans.scala:24) > at scala.Function0$class.apply$mcV$sp(Function0.scala:40) > at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) > at scala.App$class.main(App.scala:71) > at > com.tempurer.intelligence.adhocjobs.spark.kMeans$.main(kMeans.scala:9) > at com.tempurer.intelligence.adhocjobs.spark.kMeans.main(kMeans.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525) > 15/12/23 00:49:01 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 15, (reason: User class threw exception: > java.lang.ArrayIndexOutOfBoundsException: 0) > In Local mode with large data(2375849 vectors of size 200) , the first > sampling stage completes. Second stage suspends execution without any error > message. No Active execution in progress. I could only see the below warning > message > 15/12/23 01:24:13 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID > 37) in 29 ms on localhost (4/34) > 15/12/23 01:24:14 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:14 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 2 total executors! > 15/12/23 01:24:15 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:15 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 3 total executors! > 15/12/23 01:24:16 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:16 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 4 total executors! > 15/12/23 01:24:17 WARN SparkContext: Requesting executors is only supported > in coarse-grained mode > 15/12/23 01:24:17 WARN ExecutorAllocationManager: Unable to reach the cluster > manager to request 5 total executors! > 15/12/23 01:24:18 WARN SparkContext: Requesting