[jira] [Commented] (SPARK-4817) [streaming]Print the specified number of data and handle all of the elements in RDD

Sean Owen (JIRA) Thu, 11 Dec 2014 03:23:06 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242402#comment-14242402
 ]


Sean Owen commented on SPARK-4817:
----------------------------------

[~tianyi] I agree that it would be nice to add a parameter to {{print()}}. But 
that is precisely what SPARK-3326 already covered. It's fine to want to 
"process" all of an RDD, and then print some of it. But, this does not require 
a new method at all.

[~surq] I don't understand your examples, relative to what you say you want to 
do. The first example only prints elements, and does it with needless 
complexity. Why {{collect()}}? The second example also doesn't do anything but 
print but tries to manually run a new job?

Something simply like this seems to be just what you want. It does something 
with the entire RDD, then prints just the first 100 elements:

{code}
stream.foreachRDD { rdd =>
  rdd.foreach(row => ... do whateveryou want with every element ...)
  rdd.take(100).foreach(println)
}
{code}

I think this also works fine:

{code}
stream.foreachRDD { rdd =>
  rdd.foreach(row => ... do whateveryou want with every element ...)
}
stream.print(100)
{code}

... if SPARK-3326 is implemented to add an argument to {{print()}}.

> [streaming]Print the specified number of data and handle all of the elements 
> in RDD
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-4817
>                 URL: https://issues.apache.org/jira/browse/SPARK-4817
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: 宿荣全
>            Priority: Minor
>
> Dstream.print function:Print 10 elements and handle 11 elements.
> A new function based on Dstream.print function is presented:
> the new function:
> Print the specified number of data and handle all of the elements in RDD.
> there is a work scene:
> val dstream = stream.map->filter->mapPartitions->print
> the data after filter need update database in mapPartitions,but don't need 
> print each data,only need to print the top 20 for view the data processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4817) [streaming]Print the specified number of data and handle all of the elements in RDD

Reply via email to