if you are joining successive lines together based on a predicate, then you
are doing a flatMap not an aggregate. you are on the right track with a
multi-pass solution. i had the same challenge when i needed a sliding
window over an RDD(see below).
[ i had suggested that the sliding window API be
Thanks, Mohit. It sounds like we're on the same page -- I used a similar
approach.
On Thu, Jul 2, 2015 at 12:27 PM, Mohit Jaggi mohitja...@gmail.com wrote:
if you are joining successive lines together based on a predicate, then
you are doing a flatMap not an aggregate. you are on the right
Try mapPartitions, which gives you an iterator, and you can produce an
iterator back.
On Tue, Jun 30, 2015 at 11:01 AM, RJ Nowling rnowl...@gmail.com wrote:
Hi all,
I have a problem where I have a RDD of elements:
Item1 Item2 Item3 Item4 Item5 Item6 ...
and I want to run a function over
That's an interesting idea! I hadn't considered that. However, looking at
the Partitioner interface, I would need to know from looking at a single
key which doesn't fit my case, unfortunately. For my case, I need to
compare successive pairs of keys. (I'm trying to re-join lines that were
split
Thanks, Reynold. I still need to handle incomplete groups that fall
between partition boundaries. So, I need a two-pass approach. I came up
with a somewhat hacky way to handle those using the partition indices and
key-value pairs as a second pass after the first.
OCaml's std library provides a
could you use a custom partitioner to preserve boundaries such that all related
tuples end up on the same partition?
On Jun 30, 2015, at 12:00 PM, RJ Nowling rnowl...@gmail.com wrote:
Thanks, Reynold. I still need to handle incomplete groups that fall between
partition boundaries. So, I