Hi, While reviewing StreamExecution and how batches are displayed in web UI, I've noticed that currentBatchId is -1 when StreamExecution is created [1] and becomes 0 when no offsets are available [2].
That leads to my question about setting the job description for a query using getBatchDescriptionString [3]. It branches per currentBatchId and when it's -1 gives "init" [4] which never happens as showed above. That leads to the PR for SPARK-20464 "Add a job group and description for streaming queries and fix cancellation of running jobs using the job group" that sets the job description after populateStartOffsets [5]. Shouldn't it be before populateStartOffsets so getBatchDescriptionString has a chance of giving "init" and we see no two 0s? Help appreciated. [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L116 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L516 [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L878-L883 [4] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L879 [5] https://github.com/apache/spark/commit/6fc6cf88d871f5b05b0ad1a504e0d6213cf9d331#diff-6532dd3b63bdab0364fbcf2303e290e4R294 Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
