[jira] [Commented] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones
[ https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368801#comment-17368801 ] Apache Spark commented on SPARK-35862: -- User 'toujours33' has created a pull request for this issue: https://github.com/apache/spark/pull/33061 > Watermark timestamp only can be format in UTC timeZone, unfriendly to users > in other time zones > --- > > Key: SPARK-35862 > URL: https://issues.apache.org/jira/browse/SPARK-35862 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Yazhi Wang >Assignee: Apache Spark >Priority: Minor > > Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for > watermark and eventTime stats. the timestampFormat is hardcoded in UTC time > zone. > ` > private val timestampFormat = new > SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 > timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) > ` > When users set the different timezone by java options `-Duser.timezone` , > they may be confused by the information mixed with different timezone. > eg > ` > {color:#FF}*2021-06-23 16:12:07*{color} [stream execution thread for [id > = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = > 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: > Streaming query made progress: { > "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", > "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", > "name" : null, > "timestamp" : "2021-06-23T08:11:56.790Z", > "batchId" : 91740, > "numInputRows" : 2577, > "inputRowsPerSecond" : 155.33453887884266, > "processedRowsPerSecond" : 242.29033471229786, > "durationMs" : > { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : > 79, "triggerExecution" : 10636, "walCommit" : 162 } > , > "eventTime" : > {color:#FF}*{ "avg" : "2021-06-23T08:11:46.307Z", "max" : > "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : > "2021-06-23T07:41:39.000Z" }*{color} > , > ` > maybe we need to unified the timezone for time format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones
[ https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368800#comment-17368800 ] Apache Spark commented on SPARK-35862: -- User 'toujours33' has created a pull request for this issue: https://github.com/apache/spark/pull/33061 > Watermark timestamp only can be format in UTC timeZone, unfriendly to users > in other time zones > --- > > Key: SPARK-35862 > URL: https://issues.apache.org/jira/browse/SPARK-35862 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Yazhi Wang >Priority: Minor > > Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for > watermark and eventTime stats. the timestampFormat is hardcoded in UTC time > zone. > ` > private val timestampFormat = new > SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 > timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) > ` > When users set the different timezone by java options `-Duser.timezone` , > they may be confused by the information mixed with different timezone. > eg > ` > {color:#FF}*2021-06-23 16:12:07*{color} [stream execution thread for [id > = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = > 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: > Streaming query made progress: { > "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", > "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", > "name" : null, > "timestamp" : "2021-06-23T08:11:56.790Z", > "batchId" : 91740, > "numInputRows" : 2577, > "inputRowsPerSecond" : 155.33453887884266, > "processedRowsPerSecond" : 242.29033471229786, > "durationMs" : > { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : > 79, "triggerExecution" : 10636, "walCommit" : 162 } > , > "eventTime" : > {color:#FF}*{ "avg" : "2021-06-23T08:11:46.307Z", "max" : > "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : > "2021-06-23T07:41:39.000Z" }*{color} > , > ` > maybe we need to unified the timezone for time format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones
[ https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368022#comment-17368022 ] Yazhi Wang commented on SPARK-35862: I'm working on it > Watermark timestamp only can be format in UTC timeZone, unfriendly to users > in other time zones > --- > > Key: SPARK-35862 > URL: https://issues.apache.org/jira/browse/SPARK-35862 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Yazhi Wang >Priority: Minor > > Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for > watermark and eventTime stats. the timestampFormat is hardcoded in UTC time > zone. > ` > private val timestampFormat = new > SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 > timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) > ` > When users set the diffenrent timezone by java options `-Duser.timezone` , > they may be confused by the information mixed with different timezone. > eg > ` > 2021-06-23 16:12:07 [stream execution thread for [id = > 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = > 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: > Streaming query made progress: { > "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", > "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", > "name" : null, > "timestamp" : "2021-06-23T08:11:56.790Z", > "batchId" : 91740, > "numInputRows" : 2577, > "inputRowsPerSecond" : 155.33453887884266, > "processedRowsPerSecond" : 242.29033471229786, > "durationMs" : { > "addBatch" : 8671, > "getBatch" : 3, > "getOffset" : 1139, > "queryPlanning" : 79, > "triggerExecution" : 10636, > "walCommit" : 162 > }, > "eventTime" : { > "avg" : "2021-06-23T08:11:46.307Z", > "max" : "2021-06-23T08:11:55.000Z", > "min" : "2021-06-23T08:11:37.000Z", > "watermark" : "2021-06-23T07:41:39.000Z" > }, > ` > maybe we need to unified the timezone for time format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org