subject:"Equivalent to Storm's 'field grouping' in Spark."

Re: Equivalent to Storm's 'field grouping' in Spark.

2015-06-04 Thread luke89

Hi Matei, thank you for answering. Accordingly to what you said, am I mistaken when I say that tuples with the same key might eventually be spread across more than one node in case an overloaded worker can no longer accept tuples? In other words, suppose a worker (processing key K) cannot accept

Equivalent to Storm's 'field grouping' in Spark.

2015-06-03 Thread allonsy

Hi everybody, is there in Spark anything sharing the philosophy of Storm's field grouping? I'd like to manage data partitioning across the workers by sending tuples sharing the same key to the very same worker in the cluster, but I did not find any method to do that. Suggestions? :) -- View

Re: Equivalent to Storm's 'field grouping' in Spark.

2015-06-03 Thread Matei Zaharia

This happens automatically when you use the byKey operations, e.g. reduceByKey, updateStateByKey, etc. Spark Streaming keeps the state for a given set of keys on a specific node and sends new tuples with that key to that. Matei On Jun 3, 2015, at 6:31 AM, allonsy luke1...@gmail.com wrote: