[jira] [Commented] (SPARK-16833) [Spark2.0]when creating temporary function,command "add jar" doesn't work unless restart spark
[ https://issues.apache.org/jira/browse/SPARK-16833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568566#comment-15568566 ] jeffonia Tung commented on SPARK-16833: --- I've the same problem when running query through jdbc thrift server on yarn. > [Spark2.0]when creating temporary function,command "add jar" doesn't work > unless restart spark > --- > > Key: SPARK-16833 > URL: https://issues.apache.org/jira/browse/SPARK-16833 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > [Spark2.0]when creating temporary function,command "add jar" doesn't work > unless restart spark > Steps: > 1. add jar /tmp/GeoIP-0.6.8.jar; > 2. create temporary function GeoIP2 as > 'com.lenovo.lps.device.hive.udf.UDFGeoIP'; > 3. select GeoIP2('tdy'); > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 527.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 527.0 (TID 140171, smokeslave2.avatar.lenovomm.com): > java.lang.RuntimeException: Stream '/jars/GeoIP-0.6.8.jar'' was not found. > Note: After restart spark,it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12876) Race condition when driver rapidly shutdown after started.
[ https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106111#comment-15106111 ] jeffonia Tung commented on SPARK-12876: --- I've tested it's still happen in 1.4.0, and this time in the role of driver, not while the worker shutdown. I've also learned that it's already been fixed at 1.6.0 with https://github.com/apache/spark/pull/10714, so i'm wondering if this problem will be fixed either, after catching the exception at inputStream.read call of the FileAppender. My bad, i'm intend to list the problem and link with the SPARK-4300, so we can deal with the problem together. > Race condition when driver rapidly shutdown after started. > -- > > Key: SPARK-12876 > URL: https://issues.apache.org/jira/browse/SPARK-12876 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: jeffonia Tung >Priority: Minor > > It's a little same as the issue: SPARK-4300. Well, this time, it's happen on > the driver occasionally. > [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver > driver-20160118171237-0009 > [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar > file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar > to /data/dbcenter/cdh5/spark-1.4.0-bin-hado > op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar > [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying > /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar > to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri > ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar > [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: > "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" > ."org.apache.spark.deploy.worker.DriverWrapper".. > [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor > app-20160118171240-0256/15 for DirectKafkaStreamingV2 > [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: > "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" > ."org.apache.spark.executor.CoarseGrainedExecutorBackend".. > [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver > driver-20160118164724-0008 > [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to > /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout > closed: Stream closed > [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor > app-20160118164728-0250/15 > [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor > app-20160118164728-0250/15 interrupted > [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process! > [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file > /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout > java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.FilterInputStream.read(FilterInputStream.java:107) > at > org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) > at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) > at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) > at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) > at > org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) > [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor > app-20160118164728-0250/15 finished with state KILLED exitStatus 143 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.
[ https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jeffonia Tung updated SPARK-12876: -- Description: It's a little same as the issue: SPARK-4300. Well, this time, it's happen on the driver occasionally. [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" ."org.apache.spark.deploy.worker.DriverWrapper".. [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" ."org.apache.spark.executor.CoarseGrainedExecutorBackend".. [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver driver-20160118164724-0008 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout closed: Stream closed [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor app-20160118164728-0250/15 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor app-20160118164728-0250/15 interrupted [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process! [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor app-20160118164728-0250/15 finished with state KILLED exitStatus 143 was: It's a little same as the issue: SPARK-4300. Well, this time, it's happen on the driver occasionally. [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" "/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh 5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop 2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" "org.apache.spark.executor.CoarseGrainedExe
[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.
[ https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jeffonia Tung updated SPARK-12876: -- Description: It's a little same as the issue: SPARK-4300. Well, this time, it's happen on the driver occasionally. [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" "/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh 5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop 2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" "org.apache.spark.executor.CoarseGrainedExecutorBacke nd" "--driver-url" "akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" "--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" "app-20160118171240-0256" "--worker -url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker" [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver driver-20160118164724-0008 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout closed: Stream closed [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor app-20160118164728-0250/15 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor app-20160118164728-0250/15 interrupted [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process! [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor app-20160118164728-0250/15 finished with state KILLED exitStatus 143 was: It's a little same as the issue: SPARK-4300. Well, this time, it's happen on the driver occasionally. [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-2
[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.
[ https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jeffonia Tung updated SPARK-12876: -- Description: It's a little same as the issue: SPARK-4300. Well, this time, it's happen on the driver. [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" "/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh 5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop 2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" "org.apache.spark.executor.CoarseGrainedExecutorBacke nd" "--driver-url" "akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" "--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" "app-20160118171240-0256" "--worker -url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker" [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver driver-20160118164724-0008 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout closed: Stream closed [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor app-20160118164728-0250/15 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor app-20160118164728-0250/15 interrupted [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process! [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor app-20160118164728-0250/15 finished with state KILLED exitStatus 143 was: It's a little same as the issue: SPARK-4300 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:3
[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.
[ https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jeffonia Tung updated SPARK-12876: -- Description: It's a little same as the issue: SPARK-4300. Well, this time, it's happen on the driver occasionally. [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" "/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh 5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop 2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" "org.apache.spark.executor.CoarseGrainedExecutorBacke nd" "--driver-url" "akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" "--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" "app-20160118171240-0256" "--worker -url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker" [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver driver-20160118164724-0008 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout closed: Stream closed [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor app-20160118164728-0250/15 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor app-20160118164728-0250/15 interrupted [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process! [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor app-20160118164728-0250/15 finished with state KILLED exitStatus 143 was: It's a little same as the issue: SPARK-4300. Well, this time, it's happen on the driver. [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-
[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.
[ https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jeffonia Tung updated SPARK-12876: -- Description: It's a little same as the issue: SPARK-4300 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" "/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh 5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop 2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" "org.apache.spark.executor.CoarseGrainedExecutorBacke nd" "--driver-url" "akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" "--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" "app-20160118171240-0256" "--worker -url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker" [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver driver-20160118164724-0008 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout closed: Stream closed [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor app-20160118164728-0250/15 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor app-20160118164728-0250/15 interrupted [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process! [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor app-20160118164728-0250/15 finished with state KILLED exitStatus 143 was: [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" "/data/
[jira] [Created] (SPARK-12876) Race condition when driver rapidly shutdown after started.
jeffonia Tung created SPARK-12876: - Summary: Race condition when driver rapidly shutdown after started. Key: SPARK-12876 URL: https://issues.apache.org/jira/browse/SPARK-12876 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: jeffonia Tung [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver driver-20160118171237-0009 [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hado op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" . [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor app-20160118171240-0256/15 for DirectKafkaStreamingV2 [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" "/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh 5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop 2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" "org.apache.spark.executor.CoarseGrainedExecutorBacke nd" "--driver-url" "akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" "--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" "app-20160118171240-0256" "--worker -url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker" [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver driver-20160118164724-0008 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout closed: Stream closed [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor app-20160118164728-0250/15 [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor app-20160118164728-0250/15 interrupted [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process! [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor app-20160118164728-0250/15 finished with state KILLED exitStatus 143 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5928) Remote Shuffle Blocks cannot be more than 2 GB
[ https://issues.apache.org/jira/browse/SPARK-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057998#comment-15057998 ] jeffonia Tung edited comment on SPARK-5928 at 12/15/15 12:42 PM: - org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 2147483647: 9307521944 - discarded at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) I've got the same error after i got a matrix CROSS JOIN. OMG, it produce 21TB shuffle write data. was (Author: jeffonia): org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 2147483647: 9307521944 - discarded at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) I've got the save error after i got a matrix CROSS JOIN. OMG, it produce 21TB shuffle write data. > Remote Shuffle Blocks cannot be more than 2 GB > -- > > Key: SPARK-5928 > URL: https://issues.apache.org/jira/browse/SPARK-5928 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Imran Rashid > > If a shuffle block is over 2GB, the shuffle fails, with an uninformative > exception. The tasks get retried a few times and then eventually the job > fails. > Here is an example program which can cause the exception: > {code} > val rdd = sc.parallelize(1 to 1e6.toInt, 1).map{ ignore => > val n = 3e3.toInt > val arr = new Array[Byte](n) > //need to make sure the array doesn't compress to something small > scala.util.Random.nextBytes(arr) > arr > } > rdd.map { x => (1, x)}.groupByKey().count() > {code} > Note that you can't trigger this exception in local mode, it only happens on > remote fetches. I triggered these exceptions running with > {{MASTER=yarn-client spark-shell --num-executors 2 --executor-memory 4000m}} > {noformat} > 15/02/20 11:10:23 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, > imran-3.ent.cloudera.com): FetchFailed(BlockManagerId(1, > imran-2.ent.cloudera.com, 55028), shuffleId=1, mapId=0, reduceId=0, message= > org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds > 2147483647: 3021252889 - discarded > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125) > at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) > at > org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46) > at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecuto
[jira] [Commented] (SPARK-5928) Remote Shuffle Blocks cannot be more than 2 GB
[ https://issues.apache.org/jira/browse/SPARK-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057998#comment-15057998 ] jeffonia Tung commented on SPARK-5928: -- org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 2147483647: 9307521944 - discarded at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) I've got the save error after i got a matrix CROSS JOIN. OMG, it produce 21TB shuffle write data. > Remote Shuffle Blocks cannot be more than 2 GB > -- > > Key: SPARK-5928 > URL: https://issues.apache.org/jira/browse/SPARK-5928 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Imran Rashid > > If a shuffle block is over 2GB, the shuffle fails, with an uninformative > exception. The tasks get retried a few times and then eventually the job > fails. > Here is an example program which can cause the exception: > {code} > val rdd = sc.parallelize(1 to 1e6.toInt, 1).map{ ignore => > val n = 3e3.toInt > val arr = new Array[Byte](n) > //need to make sure the array doesn't compress to something small > scala.util.Random.nextBytes(arr) > arr > } > rdd.map { x => (1, x)}.groupByKey().count() > {code} > Note that you can't trigger this exception in local mode, it only happens on > remote fetches. I triggered these exceptions running with > {{MASTER=yarn-client spark-shell --num-executors 2 --executor-memory 4000m}} > {noformat} > 15/02/20 11:10:23 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, > imran-3.ent.cloudera.com): FetchFailed(BlockManagerId(1, > imran-2.ent.cloudera.com, 55028), shuffleId=1, mapId=0, reduceId=0, message= > org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds > 2147483647: 3021252889 - discarded > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125) > at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) > at > org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46) > at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: io.netty.handler.codec.TooLongFrameException: Adjusted frame > length exceeds 2147483647: 3021252889 - discarded > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501) > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477) > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403) > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java
[jira] [Commented] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%
[ https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057400#comment-15057400 ] jeffonia Tung commented on SPARK-4049: -- I got the same phase, the fraction cached goes up to 200%, and all system going well! I am just confused about that. > Storage web UI "fraction cached" shows as > 100% > > > Key: SPARK-4049 > URL: https://issues.apache.org/jira/browse/SPARK-4049 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.2.0 >Reporter: Josh Rosen >Priority: Minor > > In the Storage tab of the Spark Web UI, I saw a case where the "Fraction > Cached" was greater than 100%: > !http://i.imgur.com/Gm2hEeL.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10141) Number of tasks on executors still become negative after failures
[ https://issues.apache.org/jira/browse/SPARK-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987119#comment-14987119 ] jeffonia Tung commented on SPARK-10141: --- I've hit the same problem in the version 1.4.0. Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 2.0 failed 4 times, most recent failure: Lost task 10.3 in stage 2.0 (TID 178, 10.12.201.160): java.io.IOException: Failed to connect to /10.12.201.159:55632 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ... 1 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > Number of tasks on executors still become negative after failures > - > > Key: SPARK-10141 > URL: https://issues.apache.org/jira/browse/SPARK-10141 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.0 >Reporter: Joseph K. Bradley >Priority: Minor > Attachments: Screen Shot 2015-08-20 at 3.14.49 PM.png > > > I hit this failure when running LDA on EC2 (after I made the model size > really big). > I was using the LDAExample.scala code on an EC2 cluster with 16 workers > (r3.2xlarge), on a Wikipedia dataset: > {code} > Training set size (documents) 4534059 > Vocabulary size (terms) 1 > Training set size (tokens)895575317 > EM optimizer > 1K topics > {code} > Failure message: > {code} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in > stage 22.0 failed 4 times, most recent failure: Lost task 55.3 in stage 22.0 > (TID 2881, 10.0.202.128): java.io.IOException: Failed to connect to > /10.0.202.128:54740 > at > org.apache.