[ https://issues.apache.org/jira/browse/SPARK-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243683#comment-14243683 ]
宿荣全 commented on SPARK-4817: ---------------------------- [~srowen] I‘m sorry that didn't describe the problem clearly. If there is such a scene that there are multiple outputs: *data from HDFS files - > map-> filter->map(each row to updata mysql DB)- > filter-> map->print(print 20 datas to console)* # {color:red} output to mysql DB{color} # {color:red} output to console DB{color} this patch: ( the function {{processAllAndPrintFirst}} is new be defined.) {code} ssc.textFileStream("path").map(func1).filter(func2).map(f=>{updataMysql(f)}).filter(func3). map(func4).processAllAndPrintFirst(20) {code} How to do this scene if use {{foreachRDD}} and {{take}} or {{print(num)}}? Both [ {{rdd.foreach}} and {{rdd.take}} ]or [ {{rdd.foreach}} and {{stream.print(100)}} ] will have two Jobs in each streaming batch. With have a job to compare it will whether or not the efficiency? > [streaming]Print the specified number of data and handle all of the elements > in RDD > ----------------------------------------------------------------------------------- > > Key: SPARK-4817 > URL: https://issues.apache.org/jira/browse/SPARK-4817 > Project: Spark > Issue Type: New Feature > Components: Streaming > Reporter: 宿荣全 > Priority: Minor > > Dstream.print function:Print 10 elements and handle 11 elements. > A new function based on Dstream.print function is presented: > the new function: > Print the specified number of data and handle all of the elements in RDD. > there is a work scene: > val dstream = stream.map->filter->mapPartitions->print > the data after filter need update database in mapPartitions,but don't need > print each data,only need to print the top 20 for view the data processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org