[jira] [Commented] (SPARK-16833) [Spark2.0]when creating temporary function,command "add jar" doesn't work unless restart spark

2016-10-12 Thread jeffonia Tung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568566#comment-15568566
 ] 

jeffonia Tung commented on SPARK-16833:
---

I've the same problem when running query through jdbc thrift server on yarn.

> [Spark2.0]when creating temporary function,command "add jar" doesn't work 
> unless restart spark 
> ---
>
> Key: SPARK-16833
> URL: https://issues.apache.org/jira/browse/SPARK-16833
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: marymwu
>
> [Spark2.0]when creating temporary function,command "add jar" doesn't work 
> unless restart spark 
> Steps:
> 1. add jar /tmp/GeoIP-0.6.8.jar;
> 2. create temporary function GeoIP2 as 
> 'com.lenovo.lps.device.hive.udf.UDFGeoIP';
> 3. select GeoIP2('tdy');
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 527.0 failed 8 times, most recent failure: Lost task 0.7 in 
> stage 527.0 (TID 140171, smokeslave2.avatar.lenovomm.com): 
> java.lang.RuntimeException: Stream '/jars/GeoIP-0.6.8.jar'' was not found.
> Note: After restart spark,it works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12876) Race condition when driver rapidly shutdown after started.

2016-01-18 Thread jeffonia Tung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106111#comment-15106111
 ] 

jeffonia Tung commented on SPARK-12876:
---

I've tested it's still happen in 1.4.0, and this time in the role of driver, 
not while the worker shutdown. I've also learned that it's already been fixed 
at 1.6.0 with https://github.com/apache/spark/pull/10714, so i'm wondering if 
this problem will be fixed either, after catching the exception at 
inputStream.read call of the FileAppender.

My bad, i'm intend to list the problem and link with the SPARK-4300, so we can 
deal with the problem together.

> Race condition when driver rapidly shutdown after started.
> --
>
> Key: SPARK-12876
> URL: https://issues.apache.org/jira/browse/SPARK-12876
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: jeffonia Tung
>Priority: Minor
>
> It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
> the driver occasionally.
> [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
> driver-20160118171237-0009
> [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
> file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
>  to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
> op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
> [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
> /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
>  to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
> ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
> [INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
> "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
> ."org.apache.spark.deploy.worker.DriverWrapper"..
> [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
> app-20160118171240-0256/15 for DirectKafkaStreamingV2
> [INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
> "/data/dbcenter/jdk1.7.0_79/bin/java" "-cp"  
> ."org.apache.spark.executor.CoarseGrainedExecutorBackend"..
> [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver 
> driver-20160118164724-0008
> [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to 
> /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout
>  closed: Stream closed
> [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor 
> app-20160118164728-0250/15
> [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor 
> app-20160118164728-0250/15 interrupted
> [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process!
> [ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file 
> /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout
> java.io.IOException: Stream closed
> at 
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
> at 
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
> at 
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
> at 
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
> at 
> org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
> [INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor 
> app-20160118164728-0250/15 finished with state KILLED exitStatus 143



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.

2016-01-18 Thread jeffonia Tung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeffonia Tung updated SPARK-12876:
--
Description: 
It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
the driver occasionally.


[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
."org.apache.spark.deploy.worker.DriverWrapper"..
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp"  
."org.apache.spark.executor.CoarseGrainedExecutorBackend"..
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver 
driver-20160118164724-0008
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout
 closed: Stream closed
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor 
app-20160118164728-0250/15
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor 
app-20160118164728-0250/15 interrupted
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process!
[ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout
java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor 
app-20160118164728-0250/15 finished with state KILLED exitStatus 143

  was:
It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
the driver occasionally.


[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
"/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh
5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop
2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi
n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" 
"-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" 
"org.apache.spark.executor.CoarseGrainedExe

[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.

2016-01-18 Thread jeffonia Tung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeffonia Tung updated SPARK-12876:
--
Description: 
It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
the driver occasionally.


[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
"/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh
5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop
2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi
n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" 
"-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" 
"org.apache.spark.executor.CoarseGrainedExecutorBacke
nd" "--driver-url" 
"akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" 
"--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" 
"app-20160118171240-0256" "--worker
-url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker"
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver 
driver-20160118164724-0008
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout
 closed: Stream closed
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor 
app-20160118164728-0250/15
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor 
app-20160118164728-0250/15 interrupted
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process!
[ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout
java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor 
app-20160118164728-0250/15 finished with state KILLED exitStatus 143

  was:
It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
the driver occasionally.

[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-2

[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.

2016-01-18 Thread jeffonia Tung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeffonia Tung updated SPARK-12876:
--
Description: 
It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
the driver.

[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
"/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh
5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop
2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi
n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" 
"-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" 
"org.apache.spark.executor.CoarseGrainedExecutorBacke
nd" "--driver-url" 
"akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" 
"--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" 
"app-20160118171240-0256" "--worker
-url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker"
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver 
driver-20160118164724-0008
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout
 closed: Stream closed
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor 
app-20160118164728-0250/15
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor 
app-20160118164728-0250/15 interrupted
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process!
[ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout
java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor 
app-20160118164728-0250/15 finished with state KILLED exitStatus 143

  was:
It's a little same as the issue: SPARK-4300

[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:3

[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.

2016-01-18 Thread jeffonia Tung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeffonia Tung updated SPARK-12876:
--
Description: 
It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
the driver occasionally.

[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
"/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh
5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop
2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi
n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" 
"-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" 
"org.apache.spark.executor.CoarseGrainedExecutorBacke
nd" "--driver-url" 
"akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" 
"--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" 
"app-20160118171240-0256" "--worker
-url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker"
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver 
driver-20160118164724-0008
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout
 closed: Stream closed
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor 
app-20160118164728-0250/15
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor 
app-20160118164728-0250/15 interrupted
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process!
[ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout
java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor 
app-20160118164728-0250/15 finished with state KILLED exitStatus 143

  was:
It's a little same as the issue: SPARK-4300. Well, this time, it's happen on 
the driver.

[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-

[jira] [Updated] (SPARK-12876) Race condition when driver rapidly shutdown after started.

2016-01-18 Thread jeffonia Tung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeffonia Tung updated SPARK-12876:
--
Description: 
It's a little same as the issue: SPARK-4300

[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
"/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh
5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop
2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi
n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" 
"-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" 
"org.apache.spark.executor.CoarseGrainedExecutorBacke
nd" "--driver-url" 
"akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" 
"--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" 
"app-20160118171240-0256" "--worker
-url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker"
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver 
driver-20160118164724-0008
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout
 closed: Stream closed
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor 
app-20160118164728-0250/15
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor 
app-20160118164728-0250/15 interrupted
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process!
[ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout
java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor 
app-20160118164728-0250/15 finished with state KILLED exitStatus 143

  was:
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
"/data/

[jira] [Created] (SPARK-12876) Race condition when driver rapidly shutdown after started.

2016-01-18 Thread jeffonia Tung (JIRA)
jeffonia Tung created SPARK-12876:
-

 Summary: Race condition when driver rapidly shutdown after started.
 Key: SPARK-12876
 URL: https://issues.apache.org/jira/browse/SPARK-12876
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: jeffonia Tung


[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Asked to launch driver 
driver-20160118171237-0009
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying user jar 
file:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hado
op2.4/work/driver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Copying 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/mylib/spark-ly-streaming-v2-201601141018.jar
 to /data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/dri
ver-20160118171237-0009/spark-ly-streaming-v2-201601141018.jar
[INFO 2016-01-18 17:12:35 (Logging.scala:59)] Launch Command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" .
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Asked to launch executor 
app-20160118171240-0256/15 for DirectKafkaStreamingV2
[INFO 2016-01-18 17:12:39 (Logging.scala:59)] Launch command: 
"/data/dbcenter/jdk1.7.0_79/bin/java" "-cp" 
"/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/postgresql-9.2-1004-jdbc41.jar:/data/dbcenter/cdh
5/spark-1.4.0-bin-hadoop2.4/lib/hive-contrib-0.13.1-cdh5.2.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/sbin/../conf/:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop
2.4.0.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/data/dbcenter/cdh5/spark-1.4.0-bi
n-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" 
"-Dspark.akka.frameSize=100" "-Dspark.driver.port=35133" "-XX:MaxPermSize=128m" 
"org.apache.spark.executor.CoarseGrainedExecutorBacke
nd" "--driver-url" 
"akka.tcp://sparkDriver@10.12.201.205:35133/user/CoarseGrainedScheduler" 
"--executor-id" "15" "--hostname" "10.12.201.205" "--cores" "1" "--app-id" 
"app-20160118171240-0256" "--worker
-url" "akka.tcp://sparkWorker@10.12.201.205:5/user/Worker"
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill driver 
driver-20160118164724-0008
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Redirection to 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/driver-20160118164724-0008/stdout
 closed: Stream closed
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Asked to kill executor 
app-20160118164728-0250/15
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Runner thread for executor 
app-20160118164728-0250/15 interrupted
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Killing process!
[ERROR 2016-01-18 17:12:49 (Logging.scala:96)] Error writing stream to file 
/data/dbcenter/cdh5/spark-1.4.0-bin-hadoop2.4/work/app-20160118164728-0250/15/stdout
java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
[INFO 2016-01-18 17:12:49 (Logging.scala:59)] Executor 
app-20160118164728-0250/15 finished with state KILLED exitStatus 143



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5928) Remote Shuffle Blocks cannot be more than 2 GB

2015-12-15 Thread jeffonia Tung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057998#comment-15057998
 ] 

jeffonia Tung edited comment on SPARK-5928 at 12/15/15 12:42 PM:
-

org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 
2147483647: 9307521944 - discarded
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)


I've got the same error after i got a matrix  CROSS JOIN.  OMG, it produce 21TB 
shuffle write data. 


was (Author: jeffonia):
org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 
2147483647: 9307521944 - discarded
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)


I've got the save error after i got a matrix  CROSS JOIN.  OMG, it produce 21TB 
shuffle write data. 

> Remote Shuffle Blocks cannot be more than 2 GB
> --
>
> Key: SPARK-5928
> URL: https://issues.apache.org/jira/browse/SPARK-5928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Imran Rashid
>
> If a shuffle block is over 2GB, the shuffle fails, with an uninformative 
> exception.  The tasks get retried a few times and then eventually the job 
> fails.
> Here is an example program which can cause the exception:
> {code}
> val rdd = sc.parallelize(1 to 1e6.toInt, 1).map{ ignore =>
>   val n = 3e3.toInt
>   val arr = new Array[Byte](n)
>   //need to make sure the array doesn't compress to something small
>   scala.util.Random.nextBytes(arr)
>   arr
> }
> rdd.map { x => (1, x)}.groupByKey().count()
> {code}
> Note that you can't trigger this exception in local mode, it only happens on 
> remote fetches.   I triggered these exceptions running with 
> {{MASTER=yarn-client spark-shell --num-executors 2 --executor-memory 4000m}}
> {noformat}
> 15/02/20 11:10:23 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, 
> imran-3.ent.cloudera.com): FetchFailed(BlockManagerId(1, 
> imran-2.ent.cloudera.com, 55028), shuffleId=1, mapId=0, reduceId=0, message=
> org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 
> 2147483647: 3021252889 - discarded
>   at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
>   at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>   at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
>   at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>   at 
> org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46)
>   at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecuto

[jira] [Commented] (SPARK-5928) Remote Shuffle Blocks cannot be more than 2 GB

2015-12-15 Thread jeffonia Tung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057998#comment-15057998
 ] 

jeffonia Tung commented on SPARK-5928:
--

org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 
2147483647: 9307521944 - discarded
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:84)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)


I've got the save error after i got a matrix  CROSS JOIN.  OMG, it produce 21TB 
shuffle write data. 

> Remote Shuffle Blocks cannot be more than 2 GB
> --
>
> Key: SPARK-5928
> URL: https://issues.apache.org/jira/browse/SPARK-5928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Imran Rashid
>
> If a shuffle block is over 2GB, the shuffle fails, with an uninformative 
> exception.  The tasks get retried a few times and then eventually the job 
> fails.
> Here is an example program which can cause the exception:
> {code}
> val rdd = sc.parallelize(1 to 1e6.toInt, 1).map{ ignore =>
>   val n = 3e3.toInt
>   val arr = new Array[Byte](n)
>   //need to make sure the array doesn't compress to something small
>   scala.util.Random.nextBytes(arr)
>   arr
> }
> rdd.map { x => (1, x)}.groupByKey().count()
> {code}
> Note that you can't trigger this exception in local mode, it only happens on 
> remote fetches.   I triggered these exceptions running with 
> {{MASTER=yarn-client spark-shell --num-executors 2 --executor-memory 4000m}}
> {noformat}
> 15/02/20 11:10:23 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, 
> imran-3.ent.cloudera.com): FetchFailed(BlockManagerId(1, 
> imran-2.ent.cloudera.com, 55028), shuffleId=1, mapId=0, reduceId=0, message=
> org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 
> 2147483647: 3021252889 - discarded
>   at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
>   at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>   at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
>   at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>   at 
> org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46)
>   at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: io.netty.handler.codec.TooLongFrameException: Adjusted frame 
> length exceeds 2147483647: 3021252889 - discarded
>   at 
> io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501)
>   at 
> io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
>   at 
> io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)
>   at 
> io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java

[jira] [Commented] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2015-12-14 Thread jeffonia Tung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057400#comment-15057400
 ] 

jeffonia Tung commented on SPARK-4049:
--

I got the same phase, the fraction cached goes up to 200%, and all system going 
well! I am just confused about that. 

> Storage web UI "fraction cached" shows as > 100%
> 
>
> Key: SPARK-4049
> URL: https://issues.apache.org/jira/browse/SPARK-4049
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> In the Storage tab of the Spark Web UI, I saw a case where the "Fraction 
> Cached" was greater than 100%:
> !http://i.imgur.com/Gm2hEeL.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10141) Number of tasks on executors still become negative after failures

2015-11-03 Thread jeffonia Tung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987119#comment-14987119
 ] 

jeffonia Tung commented on SPARK-10141:
---

I've hit the same problem in the version 1.4.0.

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 10 in stage 2.0 failed 4 times, most recent failure: Lost 
task 10.3 in stage 2.0 (TID 178, 10.12.201.160): java.io.IOException: Failed to 
connect to /10.12.201.159:55632
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



> Number of tasks on executors still become negative after failures
> -
>
> Key: SPARK-10141
> URL: https://issues.apache.org/jira/browse/SPARK-10141
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Joseph K. Bradley
>Priority: Minor
> Attachments: Screen Shot 2015-08-20 at 3.14.49 PM.png
>
>
> I hit this failure when running LDA on EC2 (after I made the model size 
> really big).
> I was using the LDAExample.scala code on an EC2 cluster with 16 workers 
> (r3.2xlarge), on a Wikipedia dataset:
> {code}
> Training set size (documents) 4534059
> Vocabulary size (terms)   1
> Training set size (tokens)895575317
> EM optimizer
> 1K topics
> {code}
> Failure message:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
> stage 22.0 failed 4 times, most recent failure: Lost task 55.3 in stage 22.0 
> (TID 2881, 10.0.202.128): java.io.IOException: Failed to connect to 
> /10.0.202.128:54740
> at 
> org.apache.