[ https://issues.apache.org/jira/browse/SPARK-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242402#comment-14242402 ]
Sean Owen commented on SPARK-4817: ---------------------------------- [~tianyi] I agree that it would be nice to add a parameter to {{print()}}. But that is precisely what SPARK-3326 already covered. It's fine to want to "process" all of an RDD, and then print some of it. But, this does not require a new method at all. [~surq] I don't understand your examples, relative to what you say you want to do. The first example only prints elements, and does it with needless complexity. Why {{collect()}}? The second example also doesn't do anything but print but tries to manually run a new job? Something simply like this seems to be just what you want. It does something with the entire RDD, then prints just the first 100 elements: {code} stream.foreachRDD { rdd => rdd.foreach(row => ... do whateveryou want with every element ...) rdd.take(100).foreach(println) } {code} I think this also works fine: {code} stream.foreachRDD { rdd => rdd.foreach(row => ... do whateveryou want with every element ...) } stream.print(100) {code} ... if SPARK-3326 is implemented to add an argument to {{print()}}. > [streaming]Print the specified number of data and handle all of the elements > in RDD > ----------------------------------------------------------------------------------- > > Key: SPARK-4817 > URL: https://issues.apache.org/jira/browse/SPARK-4817 > Project: Spark > Issue Type: New Feature > Components: Streaming > Reporter: 宿荣全 > Priority: Minor > > Dstream.print function:Print 10 elements and handle 11 elements. > A new function based on Dstream.print function is presented: > the new function: > Print the specified number of data and handle all of the elements in RDD. > there is a work scene: > val dstream = stream.map->filter->mapPartitions->print > the data after filter need update database in mapPartitions,but don't need > print each data,only need to print the top 20 for view the data processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org