[jira] [Commented] (SPARK-4817) [streaming]Print the specified number of data and handle all of the elements in RDD

Sean Owen (JIRA) Fri, 12 Dec 2014 04:21:44 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244048#comment-14244048
 ]


Sean Owen commented on SPARK-4817:
----------------------------------

Quite simply, what is wrong with calling {{print(20)}} at the end of your line? 
(Again assuming we had committed SPARK-3325; right now there is only 
{{print()}}. I can't see why a new method is needed for this.

Really, you should separate this into two operations. One should be a sequence 
of DStream operations that ends with a foreachRDD that writes to a database. 
It's not quite right to use map for this. And the other operation continues 
from the first filter instead, and further filters and maps, and prints. But 
this is a separate question.

See my comments above for another way to structure this. No, there is only one 
job. There is no repeated computation in any of these scenarios. Unless this 
method can be clearly motivated, all I can see here is an exact duplicate of 
the request in SPARK-3325.

> [streaming]Print the specified number of data and handle all of the elements 
> in RDD
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-4817
>                 URL: https://issues.apache.org/jira/browse/SPARK-4817
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: 宿荣全
>            Priority: Minor
>
> Dstream.print function:Print 10 elements and handle 11 elements.
> A new function based on Dstream.print function is presented:
> the new function:
> Print the specified number of data and handle all of the elements in RDD.
> there is a work scene:
> val dstream = stream.map->filter->mapPartitions->print
> the data after filter need update database in mapPartitions,but don't need 
> print each data,only need to print the top 20 for view the data processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4817) [streaming]Print the specified number of data and handle all of the elements in RDD

Reply via email to