Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-31 Thread Stephen Haberman
Looking at https://github.com/apache/spark/pull/1222/files , the following change may have caused what Stephen described: + if (!fileSystem.isDirectory(new Path(logBaseDir))) { When there is no schema associated with logBaseDir, local path should be assumed. Yes, that looks right. In

Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-30 Thread Stephen Haberman
...@gmail.com wrote: Stephen, Scala 2.11 worked fine for me. Did the dev change and then compile. Not using in production, but I go back and forth between 2.10 2.11. Cheers k/ On Wed, Jan 28, 2015 at 12:18 PM, Stephen Haberman stephen.haber...@gmail.com wrote: Hey, I recently compiled

spark-shell working in scala-2.11

2015-01-28 Thread Stephen Haberman
Hey, I recently compiled Spark master against scala-2.11 (by running the dev/change-versions script), but when I run spark-shell, it looks like the sc variable is missing. Is this a known/unknown issue? Are others successfully using Spark with scala-2.11, and specifically spark-shell? It is

Re: recent join/iterator fix

2014-12-29 Thread Stephen Haberman
It wasn't so much the cogroup that was optimized here, but what is done to the result of cogroup. Right. Yes, it was a matter of not materializing the entire result of a flatMap-like function after the cogroup, since this will accept just an Iterator (actually, TraversableOnce). Yeah...I

Re: recent join/iterator fix

2014-12-29 Thread Stephen Haberman
Hi Shixiong, The Iterable from cogroup is CompactBuffer, which is already materialized. It's not a lazy Iterable. So now Spark cannot handle skewed data that some key has too many values that cannot be fit into the memory.​ Cool, thanks for the confirmation. - Stephen

recent join/iterator fix

2014-12-28 Thread Stephen Haberman
Hey, I saw this commit go by, and find it fairly fascinating: https://github.com/apache/spark/commit/c233ab3d8d75a33495298964fe73dbf7dd8fe305 For two reasons: 1) we have a report that is bogging down exactly in a .join with lots of elements, so, glad to see the fix, but, more interesting I