[ 
https://issues.apache.org/jira/browse/SPARK-53690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayant Sharma updated SPARK-53690:
----------------------------------
    Attachment: driver-log4j.log

> SS | `avgOffsetsBehindLatest` and `estimatedTotalBytesBehindLatest` in 
> progress report show values in exponential notation for large numbers
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-53690
>                 URL: https://issues.apache.org/jira/browse/SPARK-53690
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.5.0, 3.5.2, 4.0.0
>            Reporter: Jayant Sharma
>            Priority: Major
>             Fix For: 3.5.2, 4.1.0, 4.0.0
>
>         Attachments: driver-log4j.log
>
>
> In Structured Streaming, the Apache Kafka sources object has metrics 
> *inputRowsPerSecond* and *processedRowsPerSecond* which are currently stored 
> as *Double* and passed as Double in the JSON representation. For large 
> values, they can be reported as exponential/scientific notation (e.g., 
> {*}6.923076923076923E7{*}) or double values with very large scale (e.g., 
> {*}638297.8723404256{*})
> This formatting is not user-friendly. A user can easily interpret 
> *6.923076923076923E7* as *6.92* instead of {*}69,230,769.23{*}, as *E* can be 
> missed to be spotted.
> Example from a driver-log4j file attached below:
> {code:java}
> INFO ProgressReporter: [queryId = d21e9] [batchId = 1248] Streaming query 
> made progress: {
>   "id" : "d21e9dc9-95be-4548-8b1c-d5a576691abf",
>   "runId" : "5023fd98-6e3d-44b1-ba52-8499c24ab8a0",
>   "name" : null,
>   "timestamp" : "2025-09-17T05:59:43.218Z",
>   "batchId" : 1248,
>   "batchDuration" : 192206,
>   "numInputRows" : 8000000,
>   "inputRowsPerSecond" : 78886.12787441329,
>   "processedRowsPerSecond" : 41622.009718739275,
>   "durationMs" : {
>     "addBatch" : 191458,
>     "commitBatch" : 411,
>     "commitOffsets" : 82,
>     "getBatch" : 0,
>     "latestOffset" : 3,
>     "queryPlanning" : 169,
>     "triggerExecution" : 192206,
>     "walCommit" : 80
>   },
>   "stateOperators" : [ ],
>   "sources" : [ {
>     "description" : "KafkaV2[Subscribe[bigdata_omsTrade_cdc]]",
>     "startOffset" : {
>       "bigdata_omsTrade_cdc" : {
>         "0" : 18408809459
>       }
>     },
>     "endOffset" : {
>       "bigdata_omsTrade_cdc" : {
>         "0" : 18416809459
>       }
>     },
>     "latestOffset" : {
>       "bigdata_omsTrade_cdc" : {
>         "0" : 18700472399
>       }
>     },
>     "numInputRows" : 8000000,
>     "inputRowsPerSecond" : 78886.12787441329,
>     "processedRowsPerSecond" : 41622.009718739275,
>     "metrics" : {
>       "avgOffsetsBehindLatest" : "2.8366294E8",
>       "estimatedTotalBytesBehindLatest" : "7.187828359657416E11",
>       "maxOffsetsBehindLatest" : "283662940",
>       "minOffsetsBehindLatest" : "283662940"
>     }
>   } ],
>   "sink" : {
>     "description" : 
> "DeltaSink[s3://<dummy-storage>/__unitystorage/schemas/75bc9f38-9af3-4af9-852b-7077489f93d3/tables/c963dc05-f685-4fe8-a744-500cb40ce28a]",
>     "numOutputRows" : -1
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to