[ https://issues.apache.org/jira/browse/SPARK-35212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331362#comment-17331362 ]
Wang Yuan commented on SPARK-35212: ----------------------------------- By the way, after looking into getPreferredLocations method of KafkaRDD, the method implementation takes "PreferConsitent" logic as default. We also could change KafkaRDD's constructor, replace "preferredHosts" with LocationStrategy, and implement getPreferredLocations with "random" as default. > Spark Streaming LocationStrategy should provide a random option that mapping > kafka partitions randomly to spark executors > ------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-35212 > URL: https://issues.apache.org/jira/browse/SPARK-35212 > Project: Spark > Issue Type: New Feature > Components: DStreams, Spark Core > Affects Versions: 3.1.1 > Reporter: Wang Yuan > Priority: Critical > Labels: pull-request-available > Original Estimate: 2h > Remaining Estimate: 2h > > There are three LocationStrategy: PreferBrokers, PreferConsistent, > PreferFixed. I got a scenario that I need a random one. There are plenty of > topic partitions that are varies from each other with different records > inside. And I have a lot of executors. PreferBrokers does not help here. > PreferConsistent will make things worse that some executor will always get > heavy tasks. PreferFixed does not help too, because it is fixed, neither to > say I have to create a mapping manually. > A random LocationStrategy should dispatch a topic partition to different > executors in different window. This would balance the load among spark > executors. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org