Hi All, I sent the mail about this streaming corrupted problem a few days ago.
I use published Spark 0.8.0-incubating, not the latest, and Java version is 1.6.0_30, so I think this is not a recently introduced problem. This problem blocks me recently, I was wondering if you guys have any clue. Thanks Jerry -----Original Message----- From: Mark Hamstra [mailto:m...@clearstorydata.com] Sent: Thursday, October 31, 2013 7:25 AM To: dev@spark.incubator.apache.org Subject: Re: Getting failures in FileServerSuite Maybe I was bailing too early, Kay. I'm sure I waited at least 15 mins, but maybe not 30. On Wed, Oct 30, 2013 at 3:45 PM, Kay Ousterhout <k...@eecs.berkeley.edu>wrote: > Patrick: I don't think this was caused by a recent merge -- pretty > sure I was seeing it last week. > > Mark: Are you sure the examples assembly is hanging, as opposed to > just taking a long time? It takes ~30 minutes on my machine (not > doubting that the Java version update fixes it -- just pointing out > that if you wait, it may actually finish). > > Evan: One thing to note is that the log message is wrong (see > https://github.com/apache/incubator-spark/pull/126): the task is > actually failing just once, not 4 times. Doesn't help fix the issue > -- but just thought I'd point it out in case anyone else is trying to look > into this. > > > On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell <pwend...@gmail.com> > wrote: > > > This may have been caused by a recent merge since a bunch of people > > independently hit it in the last 48 hours. > > > > One debugging step would be to narrow it down to which merge caused > > it. I don't have time personally today, but just a suggestion for > > ppl for whom this is blocking progress. > > > > - Patrick > > > > On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra > > <m...@clearstorydata.com> > > wrote: > > > What JDK version on you using, Evan? > > > > > > I tried to reproduce your problem earlier today, but I wasn't even > > > able > > to > > > get through the assembly build -- kept hanging when trying to > > > build the examples assembly. Foregoing the assembly and running > > > the tests would > > hang > > > on FileServerSuite "Dynamically adding JARS locally" -- no stack > > > trace, just hung. And I was actually seeing a very similar stack > > > trace to > yours > > > from a test suite of our own running against 0.8.1-SNAPSHOT -- not > > exactly > > > the same because line numbers were different once it went into the > > > java runtime, and it eventually ended up someplace a little > > > different. That > > got > > > me curious about differences in Java versions, so I updated to the > latest > > > Oracle release (1.7.0_45). Now it cruises right through the build > > > and > > test > > > of Spark master from before Matei merged your PR. Then I logged > > > into a machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, > > > actually) installed, and I'm right back to the hanging during the > > > examples > assembly > > > (but passes FileServerSuite, oddly enough.) Upgrading the JDK > > > didn't improve the results of the ClearStory test suite I was > > > looking at, so > my > > > misery isn't over; but yours might be with a newer JDK.... > > > > > > > > > > > > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <e...@ooyala.com> wrote: > > > > > >> Must be a local environment thing, because AmpLab Jenkins can't > > >> reproduce it..... :-p > > >> > > >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen > > >> <rosenvi...@gmail.com> > > wrote: > > >> > Someone on the users list also encountered this exception: > > >> > > > >> > > > >> > > > https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310. > mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp. > intel.com%3E > > >> > > > >> > > > >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <e...@ooyala.com> wrote: > > >> > > > >> >> I'm at the latest > > >> >> > > >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05 > > >> >> Merge: aec9bf9 a197137 > > >> >> Author: Reynold Xin <r...@apache.org> > > >> >> Date: Tue Oct 29 01:41:44 2013 -0400 > > >> >> > > >> >> > > >> >> and seeing this when I do a "test-only FileServerSuite": > > >> >> > > >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed > > >> >> ResultTask(0, 0) > > >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due > > >> >> to java.io.StreamCorruptedException > > >> >> java.io.StreamCorruptedException: invalid type code: AC > > >> >> at > > >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > > >> >> at > > >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) > > >> >> at > > >> >> > > >> > > > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaS > erializer.scala:39) > > >> >> at > > >> >> > > >> > > > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Seri > alizer.scala:101) > > >> >> at > > >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) > > >> >> at > > >> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440) > > >> >> at > > >> >> > > >> > > > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.sc > ala:26) > > >> >> at > > >> >> > > >> > > > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.s > cala:27) > > >> >> at > > >> >> > > org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:5 > > 3) > > >> >> at > > >> >> > > >> > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(Pa > irRDDFunctions.scala:95) > > >> >> at > > >> >> > > >> > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(Pa > irRDDFunctions.scala:94) > > >> >> at > > >> >> > > >> > > > org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitions > WithContextRDD.scala:40) > > >> >> at > > >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237) > > >> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226) > > >> >> at > > >> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107) > > >> >> at org.apache.spark.scheduler.Task.run(Task.scala:53) > > >> >> at > > >> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212) > > >> >> at > > >> >> > > >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu > tor.java:895) > > >> >> at > > >> >> > > >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > java:918) > > >> >> at java.lang.Thread.run(Thread.java:680) > > >> >> > > >> >> > > >> >> Anybody else seen this yet? > > >> >> > > >> >> I have a really simple PR and this fails without my change, so > > >> >> I > may > > >> >> go ahead and submit it anyways. > > >> >> > > >> >> -- > > >> >> -- > > >> >> Evan Chan > > >> >> Staff Engineer > > >> >> e...@ooyala.com | > > >> >> > > >> > > >> > > >> > > >> -- > > >> -- > > >> Evan Chan > > >> Staff Engineer > > >> e...@ooyala.com | > > >> > > >