[jira] [Commented] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones

2021-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368801#comment-17368801
 ] 

Apache Spark commented on SPARK-35862:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/33061

> Watermark timestamp only can be format in UTC timeZone, unfriendly to users 
> in other time zones
> ---
>
> Key: SPARK-35862
> URL: https://issues.apache.org/jira/browse/SPARK-35862
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Yazhi Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for 
> watermark and eventTime stats. the timestampFormat is hardcoded in UTC time 
> zone.
> `
> private val timestampFormat = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601
>  timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC"))
> `
> When users set the different timezone by java options `-Duser.timezone` , 
> they may be confused by the information mixed with different timezone.
> eg
> `
>  {color:#FF}*2021-06-23 16:12:07*{color} [stream execution thread for [id 
> = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = 
> 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: 
> Streaming query made progress: {
>  "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316",
>  "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20",
>  "name" : null,
>  "timestamp" : "2021-06-23T08:11:56.790Z",
>  "batchId" : 91740,
>  "numInputRows" : 2577,
>  "inputRowsPerSecond" : 155.33453887884266,
>  "processedRowsPerSecond" : 242.29033471229786,
>  "durationMs" :
> { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : 
> 79, "triggerExecution" : 10636, "walCommit" : 162 }
> ,
>  "eventTime" :
> {color:#FF}*{ "avg" : "2021-06-23T08:11:46.307Z", "max" : 
> "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : 
> "2021-06-23T07:41:39.000Z" }*{color}
> ,
> `
> maybe we need to unified the timezone for time format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones

2021-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368800#comment-17368800
 ] 

Apache Spark commented on SPARK-35862:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/33061

> Watermark timestamp only can be format in UTC timeZone, unfriendly to users 
> in other time zones
> ---
>
> Key: SPARK-35862
> URL: https://issues.apache.org/jira/browse/SPARK-35862
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Yazhi Wang
>Priority: Minor
>
> Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for 
> watermark and eventTime stats. the timestampFormat is hardcoded in UTC time 
> zone.
> `
> private val timestampFormat = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601
>  timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC"))
> `
> When users set the different timezone by java options `-Duser.timezone` , 
> they may be confused by the information mixed with different timezone.
> eg
> `
>  {color:#FF}*2021-06-23 16:12:07*{color} [stream execution thread for [id 
> = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = 
> 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: 
> Streaming query made progress: {
>  "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316",
>  "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20",
>  "name" : null,
>  "timestamp" : "2021-06-23T08:11:56.790Z",
>  "batchId" : 91740,
>  "numInputRows" : 2577,
>  "inputRowsPerSecond" : 155.33453887884266,
>  "processedRowsPerSecond" : 242.29033471229786,
>  "durationMs" :
> { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : 
> 79, "triggerExecution" : 10636, "walCommit" : 162 }
> ,
>  "eventTime" :
> {color:#FF}*{ "avg" : "2021-06-23T08:11:46.307Z", "max" : 
> "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : 
> "2021-06-23T07:41:39.000Z" }*{color}
> ,
> `
> maybe we need to unified the timezone for time format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones

2021-06-23 Thread Yazhi Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368022#comment-17368022
 ] 

Yazhi Wang commented on SPARK-35862:


I'm working on it

> Watermark timestamp only can be format in UTC timeZone, unfriendly to users 
> in other time zones
> ---
>
> Key: SPARK-35862
> URL: https://issues.apache.org/jira/browse/SPARK-35862
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Yazhi Wang
>Priority: Minor
>
> Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for 
> watermark and eventTime stats. the timestampFormat is hardcoded in UTC time 
> zone.
> `
> private val timestampFormat = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601
> timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC"))
> `
> When users set the diffenrent timezone by java options `-Duser.timezone` , 
> they may be confused by the information mixed with different timezone.
> eg
> `
>  2021-06-23 16:12:07 [stream execution thread for [id = 
> 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = 
> 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: 
> Streaming query made progress: {
>  "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316",
>  "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20",
>  "name" : null,
>  "timestamp" : "2021-06-23T08:11:56.790Z",
>  "batchId" : 91740,
>  "numInputRows" : 2577,
>  "inputRowsPerSecond" : 155.33453887884266,
>  "processedRowsPerSecond" : 242.29033471229786,
>  "durationMs" : {
>  "addBatch" : 8671,
>  "getBatch" : 3,
>  "getOffset" : 1139,
>  "queryPlanning" : 79,
>  "triggerExecution" : 10636,
>  "walCommit" : 162
>  },
>  "eventTime" : {
>  "avg" : "2021-06-23T08:11:46.307Z",
>  "max" : "2021-06-23T08:11:55.000Z",
>  "min" : "2021-06-23T08:11:37.000Z",
>  "watermark" : "2021-06-23T07:41:39.000Z"
>  },
> `
> maybe we need to unified the timezone for time format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org