[
https://issues.apache.org/jira/browse/SPARK-53690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jayant Sharma updated SPARK-53690:
----------------------------------
Attachment: driver-log4j.log
> SS | `avgOffsetsBehindLatest` and `estimatedTotalBytesBehindLatest` in
> progress report show values in exponential notation for large numbers
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-53690
> URL: https://issues.apache.org/jira/browse/SPARK-53690
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 3.5.0, 3.5.2, 4.0.0
> Reporter: Jayant Sharma
> Priority: Major
> Fix For: 3.5.2, 4.1.0, 4.0.0
>
> Attachments: driver-log4j.log
>
>
> In Structured Streaming, the Apache Kafka sources object has metrics
> *inputRowsPerSecond* and *processedRowsPerSecond* which are currently stored
> as *Double* and passed as Double in the JSON representation. For large
> values, they can be reported as exponential/scientific notation (e.g.,
> {*}6.923076923076923E7{*}) or double values with very large scale (e.g.,
> {*}638297.8723404256{*})
> This formatting is not user-friendly. A user can easily interpret
> *6.923076923076923E7* as *6.92* instead of {*}69,230,769.23{*}, as *E* can be
> missed to be spotted.
> Example from a driver-log4j file attached below:
> {code:java}
> INFO ProgressReporter: [queryId = d21e9] [batchId = 1248] Streaming query
> made progress: {
> "id" : "d21e9dc9-95be-4548-8b1c-d5a576691abf",
> "runId" : "5023fd98-6e3d-44b1-ba52-8499c24ab8a0",
> "name" : null,
> "timestamp" : "2025-09-17T05:59:43.218Z",
> "batchId" : 1248,
> "batchDuration" : 192206,
> "numInputRows" : 8000000,
> "inputRowsPerSecond" : 78886.12787441329,
> "processedRowsPerSecond" : 41622.009718739275,
> "durationMs" : {
> "addBatch" : 191458,
> "commitBatch" : 411,
> "commitOffsets" : 82,
> "getBatch" : 0,
> "latestOffset" : 3,
> "queryPlanning" : 169,
> "triggerExecution" : 192206,
> "walCommit" : 80
> },
> "stateOperators" : [ ],
> "sources" : [ {
> "description" : "KafkaV2[Subscribe[bigdata_omsTrade_cdc]]",
> "startOffset" : {
> "bigdata_omsTrade_cdc" : {
> "0" : 18408809459
> }
> },
> "endOffset" : {
> "bigdata_omsTrade_cdc" : {
> "0" : 18416809459
> }
> },
> "latestOffset" : {
> "bigdata_omsTrade_cdc" : {
> "0" : 18700472399
> }
> },
> "numInputRows" : 8000000,
> "inputRowsPerSecond" : 78886.12787441329,
> "processedRowsPerSecond" : 41622.009718739275,
> "metrics" : {
> "avgOffsetsBehindLatest" : "2.8366294E8",
> "estimatedTotalBytesBehindLatest" : "7.187828359657416E11",
> "maxOffsetsBehindLatest" : "283662940",
> "minOffsetsBehindLatest" : "283662940"
> }
> } ],
> "sink" : {
> "description" :
> "DeltaSink[s3://<dummy-storage>/__unitystorage/schemas/75bc9f38-9af3-4af9-852b-7077489f93d3/tables/c963dc05-f685-4fe8-a744-500cb40ce28a]",
> "numOutputRows" : -1
> }
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]