+1
On Tue, Jun 30, 2015 at 5:27 PM, Reynold Xin r...@databricks.com wrote:
+1
On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com
wrote:
Please vote on releasing the following candidate as Apache Spark version
1.4.1!
This release fixes a handful of known issues in Spark
+1
On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com
wrote:
Please vote on releasing the following candidate as Apache Spark version
1.4.1!
This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1
The tag to be voted on
Why is reduce in DStream implemented with a map, reduceByKey and another
map, given that we have an RDD.reduce?
I ran into a similar problem, reading a csv file into a DataFrame and saving
to Parquet with 'partitionBy', and getting OutOfMemory error even though
it's not a large data file.
I discovered that by default Spark appears to be allocating a block of 128MB
in memory for each output Parquet
Try mapPartitions, which gives you an iterator, and you can produce an
iterator back.
On Tue, Jun 30, 2015 at 11:01 AM, RJ Nowling rnowl...@gmail.com wrote:
Hi all,
I have a problem where I have a RDD of elements:
Item1 Item2 Item3 Item4 Item5 Item6 ...
and I want to run a function over
https://issues.apache.org/jira/browse/SPARK-8597
A JIRA ticket discussing the same problem (with more insights than here)!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/DataFrame-partitionBy-issues-tp12838p12974.html
Sent from the Apache Spark
That's an interesting idea! I hadn't considered that. However, looking at
the Partitioner interface, I would need to know from looking at a single
key which doesn't fit my case, unfortunately. For my case, I need to
compare successive pairs of keys. (I'm trying to re-join lines that were
split
Thanks, Reynold. I still need to handle incomplete groups that fall
between partition boundaries. So, I need a two-pass approach. I came up
with a somewhat hacky way to handle those using the partition indices and
key-value pairs as a second pass after the first.
OCaml's std library provides a
could you use a custom partitioner to preserve boundaries such that all related
tuples end up on the same partition?
On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote:
Thanks, Reynold. I still need to handle incomplete groups that fall between
partition boundaries. So, I