RE: Getting failures in FileServerSuite

2013-10-30 Thread Shao, Saisai
Hi All,

I sent the mail about this streaming corrupted problem a few days ago.

I use published Spark 0.8.0-incubating, not the latest, and Java version is 
1.6.0_30, so I think this is not a recently introduced problem. This problem 
blocks me recently, I was wondering if you guys have any clue.

Thanks
Jerry

-Original Message-
From: Mark Hamstra [mailto:m...@clearstorydata.com] 
Sent: Thursday, October 31, 2013 7:25 AM
To: dev@spark.incubator.apache.org
Subject: Re: Getting failures in FileServerSuite

Maybe I was bailing too early, Kay.  I'm sure I waited at least 15 mins, but 
maybe not 30.



On Wed, Oct 30, 2013 at 3:45 PM, Kay Ousterhout wrote:

> Patrick: I don't think this was caused by a recent merge -- pretty 
> sure I was seeing it last week.
>
> Mark: Are you sure the examples assembly is hanging, as opposed to 
> just taking a long time?  It takes ~30 minutes on my machine (not 
> doubting that the Java version update fixes it -- just pointing out 
> that if you wait, it may actually finish).
>
> Evan: One thing to note is that the log message is wrong (see
> https://github.com/apache/incubator-spark/pull/126): the task is 
> actually failing just once, not 4 times.  Doesn't help fix the issue 
> -- but just thought I'd point it out in case anyone else is trying to look 
> into this.
>
>
> On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell 
> wrote:
>
> > This may have been caused by a recent merge since a bunch of people 
> > independently hit it in the last 48 hours.
> >
> > One debugging step would be to narrow it down to which merge caused 
> > it. I don't have time personally today, but just a suggestion for 
> > ppl for whom this is blocking progress.
> >
> > - Patrick
> >
> > On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra 
> > 
> > wrote:
> > > What JDK version on you using, Evan?
> > >
> > > I tried to reproduce your problem earlier today, but I wasn't even 
> > > able
> > to
> > > get through the assembly build -- kept hanging when trying to 
> > > build the examples assembly.  Foregoing the assembly and running 
> > > the tests would
> > hang
> > > on FileServerSuite "Dynamically adding JARS locally" -- no stack 
> > > trace, just hung.  And I was actually seeing a very similar stack 
> > > trace to
> yours
> > > from a test suite of our own running against 0.8.1-SNAPSHOT -- not
> > exactly
> > > the same because line numbers were different once it went into the 
> > > java runtime, and it eventually ended up someplace a little 
> > > different.  That
> > got
> > > me curious about differences in Java versions, so I updated to the
> latest
> > > Oracle release (1.7.0_45).  Now it cruises right through the build 
> > > and
> > test
> > > of Spark master from before Matei merged your PR.  Then I logged 
> > > into a machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, 
> > > actually) installed, and I'm right back to the hanging during the 
> > > examples
> assembly
> > > (but passes FileServerSuite, oddly enough.)  Upgrading the JDK 
> > > didn't improve the results of the ClearStory test suite I was 
> > > looking at, so
> my
> > > misery isn't over; but yours might be with a newer JDK
> > >
> > >
> > >
> > > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan  wrote:
> > >
> > >> Must be a local environment thing, because AmpLab Jenkins can't 
> > >> reproduce it. :-p
> > >>
> > >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen 
> > >> 
> > wrote:
> > >> > Someone on the users list also encountered this exception:
> > >> >
> > >> >
> > >>
> >
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.
> mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.
> intel.com%3E
> > >> >
> > >> >
> > >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan  wrote:
> > >> >
> > >> >> I'm at the latest
> > >> >>
> > >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> > >> >> Merge: aec9bf9 a197137
> > >> >> Author: Reynold Xin 
> > >> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> > >> >>
> > >> >>
> > >> >> and seeing this when I do a "test-only FileServerSuite":
> > >> >>
> > >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed 
> > >> >> ResultTask(0, 0)
> > >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due 
> > >> >> to java.io.StreamCorruptedException
> > >> >> java.io.StreamCorruptedException: invalid type code: AC
> > >> >> at
> > >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> > >> >> at
> > >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaS
> erializer.scala:39)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Seri
> alizer.scala:101)
> > >> >> at
> > >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> > >> >> at
> > >> scala.collection.Iterator$$anon$21.hasNext(I

Re: Getting failures in FileServerSuite

2013-10-30 Thread Mark Hamstra
Maybe I was bailing too early, Kay.  I'm sure I waited at least 15 mins,
but maybe not 30.



On Wed, Oct 30, 2013 at 3:45 PM, Kay Ousterhout wrote:

> Patrick: I don't think this was caused by a recent merge -- pretty sure I
> was seeing it last week.
>
> Mark: Are you sure the examples assembly is hanging, as opposed to just
> taking a long time?  It takes ~30 minutes on my machine (not doubting that
> the Java version update fixes it -- just pointing out that if you wait, it
> may actually finish).
>
> Evan: One thing to note is that the log message is wrong (see
> https://github.com/apache/incubator-spark/pull/126): the task is actually
> failing just once, not 4 times.  Doesn't help fix the issue -- but just
> thought I'd point it out in case anyone else is trying to look into this.
>
>
> On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell 
> wrote:
>
> > This may have been caused by a recent merge since a bunch of people
> > independently hit it in the last 48 hours.
> >
> > One debugging step would be to narrow it down to which merge caused
> > it. I don't have time personally today, but just a suggestion for ppl
> > for whom this is blocking progress.
> >
> > - Patrick
> >
> > On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra 
> > wrote:
> > > What JDK version on you using, Evan?
> > >
> > > I tried to reproduce your problem earlier today, but I wasn't even able
> > to
> > > get through the assembly build -- kept hanging when trying to build the
> > > examples assembly.  Foregoing the assembly and running the tests would
> > hang
> > > on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
> > > just hung.  And I was actually seeing a very similar stack trace to
> yours
> > > from a test suite of our own running against 0.8.1-SNAPSHOT -- not
> > exactly
> > > the same because line numbers were different once it went into the java
> > > runtime, and it eventually ended up someplace a little different.  That
> > got
> > > me curious about differences in Java versions, so I updated to the
> latest
> > > Oracle release (1.7.0_45).  Now it cruises right through the build and
> > test
> > > of Spark master from before Matei merged your PR.  Then I logged into a
> > > machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
> > > installed, and I'm right back to the hanging during the examples
> assembly
> > > (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
> > > improve the results of the ClearStory test suite I was looking at, so
> my
> > > misery isn't over; but yours might be with a newer JDK
> > >
> > >
> > >
> > > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan  wrote:
> > >
> > >> Must be a local environment thing, because AmpLab Jenkins can't
> > >> reproduce it. :-p
> > >>
> > >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen 
> > wrote:
> > >> > Someone on the users list also encountered this exception:
> > >> >
> > >> >
> > >>
> >
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
> > >> >
> > >> >
> > >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan  wrote:
> > >> >
> > >> >> I'm at the latest
> > >> >>
> > >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> > >> >> Merge: aec9bf9 a197137
> > >> >> Author: Reynold Xin 
> > >> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> > >> >>
> > >> >>
> > >> >> and seeing this when I do a "test-only FileServerSuite":
> > >> >>
> > >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> > >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> > >> >> java.io.StreamCorruptedException
> > >> >> java.io.StreamCorruptedException: invalid type code: AC
> > >> >> at
> > >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> > >> >> at
> > >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
> > >> >> at
> > >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> > >> >> at
> > >> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> > >> >> at
> > >> >>
> > org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
> > >> >> at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairR

Re: Getting failures in FileServerSuite

2013-10-30 Thread Kay Ousterhout
Patrick: I don't think this was caused by a recent merge -- pretty sure I
was seeing it last week.

Mark: Are you sure the examples assembly is hanging, as opposed to just
taking a long time?  It takes ~30 minutes on my machine (not doubting that
the Java version update fixes it -- just pointing out that if you wait, it
may actually finish).

Evan: One thing to note is that the log message is wrong (see
https://github.com/apache/incubator-spark/pull/126): the task is actually
failing just once, not 4 times.  Doesn't help fix the issue -- but just
thought I'd point it out in case anyone else is trying to look into this.


On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell  wrote:

> This may have been caused by a recent merge since a bunch of people
> independently hit it in the last 48 hours.
>
> One debugging step would be to narrow it down to which merge caused
> it. I don't have time personally today, but just a suggestion for ppl
> for whom this is blocking progress.
>
> - Patrick
>
> On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra 
> wrote:
> > What JDK version on you using, Evan?
> >
> > I tried to reproduce your problem earlier today, but I wasn't even able
> to
> > get through the assembly build -- kept hanging when trying to build the
> > examples assembly.  Foregoing the assembly and running the tests would
> hang
> > on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
> > just hung.  And I was actually seeing a very similar stack trace to yours
> > from a test suite of our own running against 0.8.1-SNAPSHOT -- not
> exactly
> > the same because line numbers were different once it went into the java
> > runtime, and it eventually ended up someplace a little different.  That
> got
> > me curious about differences in Java versions, so I updated to the latest
> > Oracle release (1.7.0_45).  Now it cruises right through the build and
> test
> > of Spark master from before Matei merged your PR.  Then I logged into a
> > machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
> > installed, and I'm right back to the hanging during the examples assembly
> > (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
> > improve the results of the ClearStory test suite I was looking at, so my
> > misery isn't over; but yours might be with a newer JDK
> >
> >
> >
> > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan  wrote:
> >
> >> Must be a local environment thing, because AmpLab Jenkins can't
> >> reproduce it. :-p
> >>
> >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen 
> wrote:
> >> > Someone on the users list also encountered this exception:
> >> >
> >> >
> >>
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
> >> >
> >> >
> >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan  wrote:
> >> >
> >> >> I'm at the latest
> >> >>
> >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> >> >> Merge: aec9bf9 a197137
> >> >> Author: Reynold Xin 
> >> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> >> >>
> >> >>
> >> >> and seeing this when I do a "test-only FileServerSuite":
> >> >>
> >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> >> >> java.io.StreamCorruptedException
> >> >> java.io.StreamCorruptedException: invalid type code: AC
> >> >> at
> >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> >> >> at
> >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> >> >> at
> >> >>
> >>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> >> >> at
> >> >>
> >>
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
> >> >> at
> >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> >> >> at
> >> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> >> >> at
> >> >>
> >>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
> >> >> at
> >> >>
> >>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> >> >> at
> >> >>
> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
> >> >> at
> >> >>
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
> >> >> at
> >> >>
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
> >> >> at
> >> >>
> >>
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
> >> >> at
> >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> >> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> >> >> at
> >> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
> >

Re: [PySpark]: reading arbitrary Hadoop InputFormats

2013-10-30 Thread Nick Pentreath
Thanks Josh, Patrick for the feedback.

Based on Josh's pointers I have something working for JavaPairRDD ->
PySpark RDD[(String, String)]. This just calls the toString method on each
key and value as before, but without the need for a delimiter. For
SequenceFile, it uses SequenceFileAsTextInputFormat which itself calls
toString to convert to Text for keys and values. We then call toString
(again) ourselves to get Strings to feed to writeAsPickle.

Details here: https://gist.github.com/MLnick/7230588

This also illustrates where the "wrapper function" api would fit in. All
that is required is to define a T => String for key and value.

I started playing around with MsgPack and can sort of get things to work in
Scala, but am struggling with getting the raw bytes to be written properly
in PythonRDD (I think it is treating them as pickled byte arrays when they
are not, but when I removed the 'stripPickle' calls and amended the length
(-6) I got "UnpicklingError: invalid load key, ' '. ").

Another issue is that MsgPack does well at writing "structures" - like Java
classes with public fields that are fairly simple - but for example the
Writables have private fields so you end up with nothing being written.
This looks like it would require custom "Templates" (serialization
functions effectively) for many classes, which means a lot of custom code
for a user to write to use it. Fortunately for most of the common Writables
a toString does the job. Will keep looking into it though.

Anyway, Josh if you have ideas or examples on the "Wrapper API from Python"
that you mentioned, I'd be interested to hear them.

If you think this is worth working up as a Pull Request covering
SequenceFiles and custom InputFormats with default toString conversions and
the ability to specify Wrapper functions, I can clean things up more, add
some functionality and tests, and also test to see if common things like
the "normal" Writables and reading from things like HBase and Cassandra can
be made to work nicely (any other common use cases that you think make
sense?).

Thoughts, comments etc welcome.

Nick



On Fri, Oct 25, 2013 at 11:03 PM, Patrick Wendell wrote:

> As a starting point, a version where people just write their own "wrapper"
> functions to convert various HadoopFiles into String  files could go
> a long way. We could even have a few built-in versions, such as dealing
> with Sequence files that are . Basically, the user needs to
> write a translator in Java/Scala that produces textual records from
> whatever format that want. Then, they make sure this is included in the
> classpath when running PySpark.
>
> As Josh is saying, I'm pretty sure this is already possible, but we may
> want to document it for users. In many organizations they might have 1-2
> people who can write the Java/Scala to do this but then many more people
> who are comfortable using python once it's setup.
>
> - Patrick
>
> On Fri, Oct 25, 2013 at 11:00 AM, Josh Rosen  wrote:
>
> > Hi Nick,
> >
> > I've seen several requests for SequenceFile support in PySpark, so
> there's
> > definitely demand for this feature.
> >
> > I like the idea of passing MsgPack'ed data (or some other structured
> > format) from Java to the Python workers.  My early prototype of custom
> > serializers (described at
> >
> >
> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals#PySparkInternals-customserializers
> > )
> > might be useful for implementing this.  Proper custom serializer support
> > would handle the bookkeeping for tracking each stage's input and output
> > formats and supplying the appropriate deserialization functions to the
> > Python worker, so the Python worker would be able to directly read the
> > MsgPack'd data that's sent to it.
> >
> > Regarding a wrapper API, it's actually possible to initially transform
> data
> > using Scala/Java and perform the remainder of the processing in PySpark.
> >  This involves adding the appropriate compiled to the Java classpath and
> a
> > bit of work in Py4J to create the Java/Scala RDD and wrap it for use by
> > PySpark.  I can hack together a rough example of this if anyone's
> > interested, but it would need some work to be developed into a
> > user-friendly API.
> >
> > If you wanted to extend your proof-of-concept to handle the cases where
> > keys and values have parseable toString() values, I think you could
> remove
> > the need for a delimiter by creating a PythonRDD from the newHadoopFile
> > JavaPairRDD and adding a new method to writeAsPickle (
> >
> >
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L224
> > )
> > to dump its contents as a pickled pair of strings.  (Aside: most of
> > writeAsPickle() would probably need be eliminated or refactored when
> adding
> > general custom serializer support).
> >
> > - Josh
> >
> > On Thu, Oct 24, 2013 at 11:18 PM, Nick Pentreath
> > wrote:
> >
> > > Hi Spark Devs
> > >
> > > I was 

Re: Getting failures in FileServerSuite

2013-10-30 Thread Patrick Wendell
This may have been caused by a recent merge since a bunch of people
independently hit it in the last 48 hours.

One debugging step would be to narrow it down to which merge caused
it. I don't have time personally today, but just a suggestion for ppl
for whom this is blocking progress.

- Patrick

On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra  wrote:
> What JDK version on you using, Evan?
>
> I tried to reproduce your problem earlier today, but I wasn't even able to
> get through the assembly build -- kept hanging when trying to build the
> examples assembly.  Foregoing the assembly and running the tests would hang
> on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
> just hung.  And I was actually seeing a very similar stack trace to yours
> from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly
> the same because line numbers were different once it went into the java
> runtime, and it eventually ended up someplace a little different.  That got
> me curious about differences in Java versions, so I updated to the latest
> Oracle release (1.7.0_45).  Now it cruises right through the build and test
> of Spark master from before Matei merged your PR.  Then I logged into a
> machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
> installed, and I'm right back to the hanging during the examples assembly
> (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
> improve the results of the ClearStory test suite I was looking at, so my
> misery isn't over; but yours might be with a newer JDK
>
>
>
> On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan  wrote:
>
>> Must be a local environment thing, because AmpLab Jenkins can't
>> reproduce it. :-p
>>
>> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen  wrote:
>> > Someone on the users list also encountered this exception:
>> >
>> >
>> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
>> >
>> >
>> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan  wrote:
>> >
>> >> I'm at the latest
>> >>
>> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
>> >> Merge: aec9bf9 a197137
>> >> Author: Reynold Xin 
>> >> Date:   Tue Oct 29 01:41:44 2013 -0400
>> >>
>> >>
>> >> and seeing this when I do a "test-only FileServerSuite":
>> >>
>> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
>> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
>> >> java.io.StreamCorruptedException
>> >> java.io.StreamCorruptedException: invalid type code: AC
>> >> at
>> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>> >> at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>> >> at
>> >>
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>> >> at
>> >>
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>> >> at
>> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> >> at
>> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>> >> at
>> >>
>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>> >> at
>> >>
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>> >> at
>> >> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
>> >> at
>> >>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
>> >> at
>> >>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
>> >> at
>> >>
>> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
>> >> at
>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> >> at
>> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>> >> at org.apache.spark.scheduler.Task.run(Task.scala:53)
>> >> at
>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
>> >> at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> >> at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> >> at java.lang.Thread.run(Thread.java:680)
>> >>
>> >>
>> >> Anybody else seen this yet?
>> >>
>> >> I have a really simple PR and this fails without my change, so I may
>> >> go ahead and submit it anyways.
>> >>
>> >> --
>> >> --
>> >> Evan Chan
>> >> Staff Engineer
>> >> e...@ooyala.com  |
>> >>
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> e...@ooyala.com  |
>>


Re: Getting failures in FileServerSuite

2013-10-30 Thread Mark Hamstra
What JDK version on you using, Evan?

I tried to reproduce your problem earlier today, but I wasn't even able to
get through the assembly build -- kept hanging when trying to build the
examples assembly.  Foregoing the assembly and running the tests would hang
on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
just hung.  And I was actually seeing a very similar stack trace to yours
from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly
the same because line numbers were different once it went into the java
runtime, and it eventually ended up someplace a little different.  That got
me curious about differences in Java versions, so I updated to the latest
Oracle release (1.7.0_45).  Now it cruises right through the build and test
of Spark master from before Matei merged your PR.  Then I logged into a
machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
installed, and I'm right back to the hanging during the examples assembly
(but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
improve the results of the ClearStory test suite I was looking at, so my
misery isn't over; but yours might be with a newer JDK



On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan  wrote:

> Must be a local environment thing, because AmpLab Jenkins can't
> reproduce it. :-p
>
> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen  wrote:
> > Someone on the users list also encountered this exception:
> >
> >
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
> >
> >
> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan  wrote:
> >
> >> I'm at the latest
> >>
> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> >> Merge: aec9bf9 a197137
> >> Author: Reynold Xin 
> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> >>
> >>
> >> and seeing this when I do a "test-only FileServerSuite":
> >>
> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> >> java.io.StreamCorruptedException
> >> java.io.StreamCorruptedException: invalid type code: AC
> >> at
> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> >> at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> >> at
> >>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> >> at
> >>
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
> >> at
> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> >> at
> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> >> at
> >>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
> >> at
> >>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> >> at
> >> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
> >> at
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
> >> at
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
> >> at
> >>
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
> >> at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> >> at
> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
> >> at org.apache.spark.scheduler.Task.run(Task.scala:53)
> >> at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >> at java.lang.Thread.run(Thread.java:680)
> >>
> >>
> >> Anybody else seen this yet?
> >>
> >> I have a really simple PR and this fails without my change, so I may
> >> go ahead and submit it anyways.
> >>
> >> --
> >> --
> >> Evan Chan
> >> Staff Engineer
> >> e...@ooyala.com  |
> >>
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> e...@ooyala.com  |
>


Re: Getting failures in FileServerSuite

2013-10-30 Thread Evan Chan
Must be a local environment thing, because AmpLab Jenkins can't
reproduce it. :-p

On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen  wrote:
> Someone on the users list also encountered this exception:
>
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
>
>
> On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan  wrote:
>
>> I'm at the latest
>>
>> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
>> Merge: aec9bf9 a197137
>> Author: Reynold Xin 
>> Date:   Tue Oct 29 01:41:44 2013 -0400
>>
>>
>> and seeing this when I do a "test-only FileServerSuite":
>>
>> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
>> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
>> java.io.StreamCorruptedException
>> java.io.StreamCorruptedException: invalid type code: AC
>> at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>> at
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>> at
>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>> at
>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>> at
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>> at
>> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
>> at
>> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>> at org.apache.spark.scheduler.Task.run(Task.scala:53)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> at java.lang.Thread.run(Thread.java:680)
>>
>>
>> Anybody else seen this yet?
>>
>> I have a really simple PR and this fails without my change, so I may
>> go ahead and submit it anyways.
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> e...@ooyala.com  |
>>



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |


Re: Getting failures in FileServerSuite

2013-10-30 Thread Josh Rosen
Someone on the users list also encountered this exception:

https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E


On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan  wrote:

> I'm at the latest
>
> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> Merge: aec9bf9 a197137
> Author: Reynold Xin 
> Date:   Tue Oct 29 01:41:44 2013 -0400
>
>
> and seeing this when I do a "test-only FileServerSuite":
>
> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> java.io.StreamCorruptedException
> java.io.StreamCorruptedException: invalid type code: AC
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> at
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
> at
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> at
> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
> at
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
> at org.apache.spark.scheduler.Task.run(Task.scala:53)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:680)
>
>
> Anybody else seen this yet?
>
> I have a really simple PR and this fails without my change, so I may
> go ahead and submit it anyways.
>
> --
> --
> Evan Chan
> Staff Engineer
> e...@ooyala.com  |
>


Getting failures in FileServerSuite

2013-10-30 Thread Evan Chan
I'm at the latest

commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
Merge: aec9bf9 a197137
Author: Reynold Xin 
Date:   Tue Oct 29 01:41:44 2013 -0400


and seeing this when I do a "test-only FileServerSuite":

13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
java.io.StreamCorruptedException
java.io.StreamCorruptedException: invalid type code: AC
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
at 
org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
at 
org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)


Anybody else seen this yet?

I have a really simple PR and this fails without my change, so I may
go ahead and submit it anyways.

-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |