[ 
https://issues.apache.org/jira/browse/SPARK-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240954#comment-14240954
 ] 

宿荣全 commented on SPARK-4817:
----------------------------

Yes, with the other  streaming's operate can be replaced, but the look and feel 
of some operating redundant, not too beautiful.And use print operation  do all  
elements in RDD the Efficiency is not high.

for example:
1.val dstream = stream.map->filter->mapPartitions->count->print(num)
2.val dstream = stream.map->filter->mapPartitions->filter(false)->print(num)
this codes is equivalent :
val dstream = stream.map->filter->mapPartitions->printTop(num)

> [streaming]Print the specified number of data and handle all of the elements 
> in RDD
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-4817
>                 URL: https://issues.apache.org/jira/browse/SPARK-4817
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: 宿荣全
>            Priority: Minor
>
> Dstream.print function:Print 10 elements and handle 11 elements.
> A new function based on Dstream.print function is presented:
> the new function:
> Print the specified number of data and handle all of the elements in RDD.
> there is a work scene:
> val dstream = stream.map->filter->mapPartitions->print
> the data after filter need update database in mapPartitions,but don't need 
> print each data,only need to print the top 20 for view the data processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to