Could you provide a script to reproduce this problem? Thanks!
On Wed, Oct 8, 2014 at 9:13 PM, Sung Hwan Chung <coded...@cs.stanford.edu> wrote: > This is also happening to me on a regular basis, when the job is large with > relatively large serialized objects used in each RDD lineage. A bad thing > about this is that this exception always stops the whole job. > > > On Fri, Sep 26, 2014 at 11:17 AM, Brad Miller <bmill...@eecs.berkeley.edu> > wrote: >> >> FWIW I suspect that each count operation is an opportunity for you to >> trigger the bug, and each filter operation increases the likelihood of >> setting up the bug. I normally don't come across this error until my job >> has been running for an hour or two and had a chance to build up longer >> lineages for some RDDs. It sounds like your data is a bit smaller and it's >> more feasible for you to build up longer lineages more quickly. >> >> If you can reduce your number of filter operations (for example by >> combining some into a single function) that may help. It may also help to >> introduce persistence or checkpointing at intermediate stages so that the >> length of the lineages that have to get replayed isn't as long. >> >> On Fri, Sep 26, 2014 at 11:10 AM, Arun Ahuja <aahuj...@gmail.com> wrote: >>> >>> No for me as well it is non-deterministic. It happens in a piece of code >>> that does many filter and counts on a small set of records (~1k-10k). The >>> originally set is persisted in memory and we have a Kryo serializer set for >>> it. The task itself takes in just a few filtering parameters. This with >>> the same setting has sometimes completed to sucess and sometimes failed >>> during this step. >>> >>> Arun >>> >>> On Fri, Sep 26, 2014 at 1:32 PM, Brad Miller <bmill...@eecs.berkeley.edu> >>> wrote: >>>> >>>> I've had multiple jobs crash due to "java.io.IOException: unexpected >>>> exception type"; I've been running the 1.1 branch for some time and am now >>>> running the 1.1 release binaries. Note that I only use PySpark. I haven't >>>> kept detailed notes or the tracebacks around since there are other problems >>>> that have caused my greater grief (namely "key not found" errors). >>>> >>>> For me the exception seems to occur non-deterministically, which is a >>>> bit interesting since the error message shows that the same stage has >>>> failed >>>> multiple times. Are you able to consistently re-produce the bug across >>>> multiple invocations at the same place? >>>> >>>> On Fri, Sep 26, 2014 at 6:11 AM, Arun Ahuja <aahuj...@gmail.com> wrote: >>>>> >>>>> Has anyone else seen this erorr in task deserialization? The task is >>>>> processing a small amount of data and doesn't seem to have much data >>>>> hanging >>>>> to the closure? I've only seen this with Spark 1.1 >>>>> >>>>> Job aborted due to stage failure: Task 975 in stage 8.0 failed 4 times, >>>>> most recent failure: Lost task 975.3 in stage 8.0 (TID 24777, host.com): >>>>> java.io.IOException: unexpected exception type >>>>> >>>>> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) >>>>> >>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) >>>>> >>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) >>>>> >>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>>>> >>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>>>> >>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>>>> >>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>>>> >>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>>>> >>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>>>> >>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>>>> >>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) >>>>> >>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) >>>>> >>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159) >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> java.lang.Thread.run(Thread.java:744) >>>> >>>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org