Liechuan Ou created SPARK-32735:
-----------------------------------

             Summary: RDD actions in DStream.transfrom don't show at batch page
                 Key: SPARK-32735
                 URL: https://issues.apache.org/jira/browse/SPARK-32735
             Project: Spark
          Issue Type: Bug
          Components: DStreams, Web UI
    Affects Versions: 3.0.0
            Reporter: Liechuan Ou


h4. Issue
{code:java}
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val mappedStream= words.transform(rdd => {
  val c = rdd.count();
  rdd.map(x => s"$c x")}
)
mappedStream.foreachRDD(rdd => rdd.foreach(x => println(x))){code}
Every batch two spark jobs are created. Only the second one is associated with 
the streaming output operation and shows at batch page.
h4. Investigation

The first action rdd.count() is invoked by JobGenerator.generateJobs. Batch 
time and output op id are not available in spark context because they are set 
in JobScheduler later.
h4. Proposal

delegate dstream.getOrCompute to JobScheduler so that all rdd actions can run 
in spark context with correct local properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to