[ https://issues.apache.org/jira/browse/SPARK-32735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro updated SPARK-32735: ------------------------------------- Labels: (was: pull-request-available) > RDD actions in DStream.transfrom don't show at batch page > --------------------------------------------------------- > > Key: SPARK-32735 > URL: https://issues.apache.org/jira/browse/SPARK-32735 > Project: Spark > Issue Type: Bug > Components: DStreams, Web UI > Affects Versions: 3.0.0 > Reporter: Liechuan Ou > Priority: Major > > h4. Issue > {code:java} > val lines = ssc.socketTextStream("localhost", 9999) > val words = lines.flatMap(_.split(" ")) > val mappedStream= words.transform(rdd => { > val c = rdd.count(); > rdd.map(x => s"$c x")} > ) > mappedStream.foreachRDD(rdd => rdd.foreach(x => println(x))){code} > Every batch two spark jobs are created. Only the second one is associated > with the streaming output operation and shows at batch page. > h4. Investigation > The first action rdd.count() is invoked by JobGenerator.generateJobs. Batch > time and output op id are not available in spark context because they are set > in JobScheduler later. > h4. Proposal > delegate dstream.getOrCompute to JobScheduler so that all rdd actions can run > in spark context with correct local properties. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org