[ https://issues.apache.org/jira/browse/SPARK-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665590#comment-15665590 ]
sandeep pournami commented on SPARK-8133: ----------------------------------------- +1 as when using Spark streaming, the underlying storage could be anything, and depending on the storage, we might need to avoid frequent reads in every batch for the same key. this can enhance the performance by multiple folds. > sticky partitions > ----------------- > > Key: SPARK-8133 > URL: https://issues.apache.org/jira/browse/SPARK-8133 > Project: Spark > Issue Type: New Feature > Components: DStreams > Affects Versions: 1.3.1 > Reporter: sid > > We are trying to replace Apache Storm with Apache Spark streaming. > In storm; we partitioned stream based on "Customer ID" so that msgs with a > range of "customer IDs" will be routed to same bolt (worker). > We do this because each worker will cache customer details (from DB). > So we split into 4 partitions and each bolt (worker) will have 1/4 of the > entire range. > I am hoping we have a solution to this in Spark Streaming -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org