Sure, do you have a URL for your patch?

Kyle
On Nov 12, 2013 5:59 PM, "Xia, Junluan" <[email protected]> wrote:

> Hi kely
>
> I also build a patch for this issue, and pass the test, you could help me
> to review if you are free.
>
> -----Original Message-----
> From: Kyle Ellrott [mailto:[email protected]]
> Sent: Wednesday, November 13, 2013 8:44 AM
> To: [email protected]
> Subject: Re: SPARK-942
>
> I've posted a patch that I think produces the correct behavior at
>
> https://github.com/kellrott/incubator-spark/commit/efe1102c8a7436b2fe112d3bece9f35fedea0dc8
>
> It works fine on my programs, but if I run the unit tests, I get errors
> like:
>
> [info] - large number of iterations *** FAILED ***
> [info]   org.apache.spark.SparkException: Job aborted: Task 4.0:0 failed
> more than 0 times; aborting job java.lang.ClassCastException:
> scala.collection.immutable.StreamIterator cannot be cast to
> scala.collection.mutable.ArrayBuffer
> [info]   at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:818)
> [info]   at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:816)
> [info]   at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
> [info]   at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> [info]   at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:816)
> [info]   at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:431)
> [info]   at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:493)
> [info]   at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:158)
>
>
> I can't figure out the line number of where the original error occurred.
> Or why I can't replicate them in my various test programs.
> Any help would be appreciated.
>
> Kyle
>
>
>
>
>
>
> On Tue, Nov 12, 2013 at 11:35 AM, Alex Boisvert <[email protected]
> >wrote:
>
> > On Tue, Nov 12, 2013 at 11:07 AM, Stephen Haberman <
> > [email protected]> wrote:
> >
> > > Huge disclaimer that this is probably a big pita to implement, and
> > > could likely not be as worthwhile as I naively think it would be.
> > >
> >
> > My perspective on this is it's already big pita of Spark users today.
> >
> > In the absence of explicit directions/hints, Spark should be able to
> > make ballpark estimates and conservatively pick # of partitions,
> > storage strategies (e.g., memory vs disk) and other runtime parameters
> that fit the
> > deployment architecture/capacities.   If this requires code and extra
> > runtime resources for sampling/measuring data, guestimating job size,
> > and so on, so be it.
> >
> > Users want working jobs first.  Optimal performance / resource
> > utilization follow from that.
> >
>

Reply via email to