Re: How to make spark partition sticky, i.e. stay with node?

2015-01-23 Thread Tathagata Das
Hello mingyu,

That is a reasonable way of doing this. Spark Streaming natively does
not support sticky because Spark launches tasks based on data
locality. If there is no locality (example reduce tasks can run
anywhere), location is randomly assigned. So the cogroup or join
introduces a locality and which forces Spark scheduler to be sticky.
Another way to achieve this is using updateStateByKey which
internally uses cogroup, but presents a nicer streaming-like API for
per-key stateful operations.

TD

On Fri, Jan 23, 2015 at 8:23 AM, mingyu mingyut...@gmail.com wrote:
 I found a workaround.
 I can make my auxiliary data a RDD. Partition it and cache it.
 Later, I can cogroup it with other RDDs and Spark will try to keep the
 cached RDD partitions where they are and not shuffle them.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21338.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to make spark partition sticky, i.e. stay with node?

2015-01-23 Thread mingyu
I found a workaround.
I can make my auxiliary data a RDD. Partition it and cache it.
Later, I can cogroup it with other RDDs and Spark will try to keep the
cached RDD partitions where they are and not shuffle them. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21338.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to make spark partition sticky, i.e. stay with node?

2015-01-22 Thread mingyu
Also, Setting spark.locality.wait=100 did not work for me.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to make spark partition sticky, i.e. stay with node?

2015-01-22 Thread mingyu
I posted an question on stackoverflow and haven't gotten any answer yet.
http://stackoverflow.com/questions/28079037/how-to-make-spark-partition-sticky-i-e-stay-with-node

Is there a way to make a partition stay with a node in Spark Streaming? I
need these since I have to load large amount partition specific auxiliary
data for processing the stream. I noticed that the partitions move among the
nodes. I cannot afford to move the large auxiliary data around.

Thanks,

Mingyu



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org