Yazhi Wang created SPARK-35862: ---------------------------------- Summary: Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones Key: SPARK-35862 URL: https://issues.apache.org/jira/browse/SPARK-35862 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.1.2 Reporter: Yazhi Wang
Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for watermark and eventTime stats. the timestampFormat is hardcoded in UTC time zone. ` private val timestampFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) ` When users set the diffenrent timezone by java options `-Duser.timezone` , they may be confused by the information mixed with different timezone. eg ` 2021-06-23 16:12:07 [stream execution thread for [id = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: Streaming query made progress: { "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", "name" : null, "timestamp" : "2021-06-23T08:11:56.790Z", "batchId" : 91740, "numInputRows" : 2577, "inputRowsPerSecond" : 155.33453887884266, "processedRowsPerSecond" : 242.29033471229786, "durationMs" : { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : 79, "triggerExecution" : 10636, "walCommit" : 162 }, "eventTime" : { "avg" : "2021-06-23T08:11:46.307Z", "max" : "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : "2021-06-23T07:41:39.000Z" }, ` maybe we need to unified the timezone for time format -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org