Re: Spark job failed
Have you considered posting on vendor forum ? FYI On Mon, Sep 14, 2015 at 6:09 AM, Renu Yadavwrote: > > -- Forwarded message -- > From: Renu Yadav > Date: Mon, Sep 14, 2015 at 4:51 PM > Subject: Spark job failed > To: d...@spark.apache.org > > > I am getting below error while running spark job: > > storage.DiskBlockObjectWriter: Uncaught exception while reverting partial > writes to file > /data/vol5/hadoop/yarn/local/usercache/renu_yadav/appcache/application_1438196554863_31545/spark-4686a622-82be-418e-a8b0-1653458bc8cb/22/temp_shuffle_8c437ba7-55d2-4520-80ec-adcfe932b3bd > java.io.FileNotFoundException: > /data/vol5/hadoop/yarn/local/usercache/renu_yadav/appcache/application_1438196554863_31545/spark-4686a622-82be-418e-a8b0-1653458bc8cb/22/temp_shuffle_8c437ba7-55d2-4520-80ec-adcfe932b3bd > (No such file or directory > > > > I am running 1.3TB data > following are the transformation > > read from hadoop->map(key/value).coalease(2000).groupByKey. > then sorting each record by server_ts and select most recent > > saving data into parquet. > > > Following is the command > spark-submit --class com.test.Myapp--master yarn-cluster --driver-memory > 16g --executor-memory 20g --executor-cores 5 --num-executors 150 > --files /home/renu_yadav/fmyapp/hive-site.xml --conf > spark.yarn.preserve.staging.files=true --conf > spark.shuffle.memoryFraction=0.6 --conf spark.storage.memoryFraction=0.1 > --conf SPARK_SUBMIT_OPTS="-XX:MaxPermSize=768m" --conf > SPARK_SUBMIT_OPTS="-XX:MaxPermSize=768m" --conf > spark.akka.timeout=40 --conf spark.locality.wait=10 --conf > spark.yarn.executor.memoryOverhead=8000 --conf > SPARK_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" > --conf spark.reducer.maxMbInFlight=96 --conf > spark.shuffle.file.buffer.kb=64 --conf > spark.core.connection.ack.wait.timeout=120 --jars > /usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-rdbms-3.2.9.jar > myapp_2.10-1.0.jar > > > > > > > > Cluster configuration > > 20 Nodes > 32 cores per node > 125 GB ram per node > > > Please Help. > > Thanks & Regards, > Renu Yadav > >
Re: Spark Job Failed (Executor Lost then FS closed)
Can you look more in the worker logs and see whats going on? It looks like a memory issue (kind of GC overhead etc., You need to look in the worker logs) Thanks Best Regards On Fri, Aug 7, 2015 at 3:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Re attaching the images. On Thu, Aug 6, 2015 at 2:50 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Code: import java.text.SimpleDateFormat import java.util.Calendar import java.sql.Date import org.apache.spark.storage.StorageLevel def extract(array: Array[String], index: Integer) = { if (index array.length) { array(index).replaceAll(\, ) } else { } } case class GuidSess( guid: String, sessionKey: String, sessionStartDate: String, siteId: String, eventCount: String, browser: String, browserVersion: String, operatingSystem: String, experimentChannel: String, deviceName: String) val rowStructText = sc.textFile(/user/zeppelin/guidsess/2015/08/05/part-m-1.gz) val guidSessRDD = rowStructText.filter(s = s.length != 1).map(s = s.split(,)).map( { s = GuidSess(extract(s, 0), extract(s, 1), extract(s, 2), extract(s, 3), extract(s, 4), extract(s, 5), extract(s, 6), extract(s, 7), extract(s, 8), extract(s, 9)) }) val guidSessDF = guidSessRDD.toDF() guidSessDF.registerTempTable(guidsess) Once the temp table is created, i wrote this query select siteid, count(distinct guid) total_visitor, count(sessionKey) as total_visits from guidsess group by siteid *Metrics:* Data Size: 170 MB Spark Version: 1.3.1 YARN: 2.7.x Timeline: There is 1 Job, 2 stages with 1 task each. *1st Stage : mapPartitions* [image: Inline image 1] 1st Stage: Task 1 started to fail. A second attempt started for 1st task of first Stage. The first attempt failed Executor LOST when i go to YARN resource manager and go to that particular host, i see that its running fine. *Attempt #1* [image: Inline image 2] *Attempt #2* Executor LOST AGAIN [image: Inline image 3] *Attempt 34* *[image: Inline image 4]* *2nd Stage runJob : SKIPPED* *[image: Inline image 5]* Any suggestions ? -- Deepak -- Deepak - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Job Failed - Class not serializable
This thread might give you some insights http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E Thanks Best Regards On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: My Spark Job failed with 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto generated through avro schema using avro-generate-sources maven pulgin. package com.ebay.ep.poc.spark.reporting.process.model.dw; @SuppressWarnings(all) @org.apache.avro.specific.AvroGenerated public class SpsLevelMetricSum extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { ... ... } Can anyone suggest how to fix this ? -- Deepak
Re: Spark Job Failed - Class not serializable
I was able to write record that extends specificrecord (avro) this class was not auto generated. Do we need to do something extra for auto generated classes Sent from my iPhone On 03-Apr-2015, at 5:06 pm, Akhil Das ak...@sigmoidanalytics.com wrote: This thread might give you some insights http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E Thanks Best Regards On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: My Spark Job failed with 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto generated through avro schema using avro-generate-sources maven pulgin. package com.ebay.ep.poc.spark.reporting.process.model.dw; @SuppressWarnings(all) @org.apache.avro.specific.AvroGenerated public class SpsLevelMetricSum extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { ... ... } Can anyone suggest how to fix this ? -- Deepak
Re: Spark Job Failed - Class not serializable
I meant that I did not have to use kyro. Why will kyro help fix this issue now ? Sent from my iPhone On 03-Apr-2015, at 5:36 pm, Deepak Jain deepuj...@gmail.com wrote: I was able to write record that extends specificrecord (avro) this class was not auto generated. Do we need to do something extra for auto generated classes Sent from my iPhone On 03-Apr-2015, at 5:06 pm, Akhil Das ak...@sigmoidanalytics.com wrote: This thread might give you some insights http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E Thanks Best Regards On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: My Spark Job failed with 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto generated through avro schema using avro-generate-sources maven pulgin. package com.ebay.ep.poc.spark.reporting.process.model.dw; @SuppressWarnings(all) @org.apache.avro.specific.AvroGenerated public class SpsLevelMetricSum extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { ... ... } Can anyone suggest how to fix this ? -- Deepak
Re: Spark Job Failed - Class not serializable
Because, its throwing up serializable exceptions and kryo is a serializer to serialize your objects. Thanks Best Regards On Fri, Apr 3, 2015 at 5:37 PM, Deepak Jain deepuj...@gmail.com wrote: I meant that I did not have to use kyro. Why will kyro help fix this issue now ? Sent from my iPhone On 03-Apr-2015, at 5:36 pm, Deepak Jain deepuj...@gmail.com wrote: I was able to write record that extends specificrecord (avro) this class was not auto generated. Do we need to do something extra for auto generated classes Sent from my iPhone On 03-Apr-2015, at 5:06 pm, Akhil Das ak...@sigmoidanalytics.com wrote: This thread might give you some insights http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E Thanks Best Regards On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: My Spark Job failed with 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto generated through avro schema using avro-generate-sources maven pulgin. package com.ebay.ep.poc.spark.reporting.process.model.dw; @SuppressWarnings(all) @org.apache.avro.specific.AvroGenerated public class SpsLevelMetricSum extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { ... ... } Can anyone suggest how to fix this ? -- Deepak
Re: Spark Job Failed - Class not serializable
You’ll definitely want to use a Kryo-based serializer for Avro. We have a Kryo based serializer that wraps the Avro efficient serializer here. Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Apr 3, 2015, at 5:41 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Because, its throwing up serializable exceptions and kryo is a serializer to serialize your objects. Thanks Best Regards On Fri, Apr 3, 2015 at 5:37 PM, Deepak Jain deepuj...@gmail.com wrote: I meant that I did not have to use kyro. Why will kyro help fix this issue now ? Sent from my iPhone On 03-Apr-2015, at 5:36 pm, Deepak Jain deepuj...@gmail.com wrote: I was able to write record that extends specificrecord (avro) this class was not auto generated. Do we need to do something extra for auto generated classes Sent from my iPhone On 03-Apr-2015, at 5:06 pm, Akhil Das ak...@sigmoidanalytics.com wrote: This thread might give you some insights http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E Thanks Best Regards On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: My Spark Job failed with 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{userId: 0, spsPrgrmId: 0, spsSlrLevelCd: 0, spsSlrLevelSumStartDt: null, spsSlrLevelSumEndDt: null, currPsLvlId: null})) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto generated through avro schema using avro-generate-sources maven pulgin. package com.ebay.ep.poc.spark.reporting.process.model.dw; @SuppressWarnings(all) @org.apache.avro.specific.AvroGenerated public class SpsLevelMetricSum extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { ... ... } Can anyone suggest how to fix this ? -- Deepak