Looking at https://github.com/apache/spark/pull/1222/files ,
the following change may have caused what Stephen described:
+ if (!fileSystem.isDirectory(new Path(logBaseDir))) {
When there is no schema associated with logBaseDir, local path
should be assumed.
Yes, that looks right. In
...@gmail.com wrote:
Stephen,
Scala 2.11 worked fine for me. Did the dev change and then
compile. Not using in production, but I go back and forth
between 2.10 2.11. Cheers
k/
On Wed, Jan 28, 2015 at 12:18 PM, Stephen Haberman
stephen.haber...@gmail.com wrote:
Hey,
I recently compiled
Hey,
I recently compiled Spark master against scala-2.11 (by running
the dev/change-versions script), but when I run spark-shell,
it looks like the sc variable is missing.
Is this a known/unknown issue? Are others successfully using
Spark with scala-2.11, and specifically spark-shell?
It is
It wasn't so much the cogroup that was optimized here, but what is
done to the result of cogroup.
Right.
Yes, it was a matter of not materializing the entire result of a
flatMap-like function after the cogroup, since this will accept just
an Iterator (actually, TraversableOnce).
Yeah...I
Hi Shixiong,
The Iterable from cogroup is CompactBuffer, which is already
materialized. It's not a lazy Iterable. So now Spark cannot handle
skewed data that some key has too many values that cannot be fit into
the memory.
Cool, thanks for the confirmation.
- Stephen
Hey,
I saw this commit go by, and find it fairly fascinating:
https://github.com/apache/spark/commit/c233ab3d8d75a33495298964fe73dbf7dd8fe305
For two reasons: 1) we have a report that is bogging down exactly in
a .join with lots of elements, so, glad to see the fix, but, more
interesting I