[jira] [Commented] (SPARK-32341) add mutiple filter in rdd function

gaokui (Jira) Wed, 29 Jul 2020 01:16:20 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-32341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167043#comment-17167043
 ]


gaokui commented on SPARK-32341:
--------------------------------

Yes, I can do that. But at that situation, I need create a lot of kafka topic 
for every single dataset, I have lots of dataset over1000. that will create 
lots of kafka topics. And then I must lanuch same spark job numbers . This job  
numbers also will  lead to over1000. At that situation , it is crazy job to 
manage and allocate machine cpu , memory.

so I need this mutiplefilter feature to  solve all the problems.

thanks

 

 

> add mutiple filter in rdd function
> ----------------------------------
>
>                 Key: SPARK-32341
>                 URL: https://issues.apache.org/jira/browse/SPARK-32341
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.4.6, 3.0.0
>            Reporter: gaokui
>            Priority: Major
>
> when i use spark rdd . i often use to read kafka data.And kafka data has lots 
> of kinds data set.
> I filter these rdd  by kafka key , then i can use Array[rdd] to fill every 
> topic rdd. 
> But at that ,  i use rdd.filter,that  will generate more than one stage.Data 
> will process by many task, that consume too many time. And it is not 
> necessary.
> i hope add multiple  filter function not rdd.filter ,that will return 
> Array[RDD] in one stage by dividing all  mixture data  RDD to single data set 
> RDD .
> function like Array[RDD]=rdd.multiplefilter(setcondition).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32341) add mutiple filter in rdd function

Reply via email to