[jira] [Commented] (SPARK-8133) sticky partitions

sandeep pournami (JIRA) Mon, 14 Nov 2016 16:56:27 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665590#comment-15665590
 ]


sandeep pournami commented on SPARK-8133:
-----------------------------------------

+1 
as when using Spark streaming, the underlying storage could be anything, and 
depending on the storage, we might need to avoid frequent reads in every batch 
for the same key. this can enhance the performance by multiple folds.

> sticky partitions
> -----------------
>
>                 Key: SPARK-8133
>                 URL: https://issues.apache.org/jira/browse/SPARK-8133
>             Project: Spark
>          Issue Type: New Feature
>          Components: DStreams
>    Affects Versions: 1.3.1
>            Reporter: sid
>
> We are trying to replace Apache Storm with Apache Spark streaming.
> In storm; we partitioned stream based on "Customer ID" so that msgs with a 
> range of "customer IDs" will be routed to same bolt (worker).
> We do this because each worker will cache customer details (from DB).
> So we split into 4 partitions and each bolt (worker) will have 1/4 of the 
> entire range.
> I am hoping we have a solution to this in Spark Streaming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8133) sticky partitions

Reply via email to