[ 
https://issues.apache.org/jira/browse/SPARK-54039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishanth updated SPARK-54039:
-----------------------------
    Fix Version/s: 4.1.0

> SS | `Add TaskID to KafkaDataConsumer logs for better correlation between 
> Spark tasks and Kafka operations`
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-54039
>                 URL: https://issues.apache.org/jira/browse/SPARK-54039
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.5.6
>            Reporter: Nishanth
>            Priority: Major
>             Fix For: 4.1.0
>
>
> Currently, the Kafka consumer logs generated by {{KafkaDataConsumer}} (in 
> {{{}org.apache.spark.sql.kafka010{}}}) do not include the Spark {{TaskID}} or 
> task context information.
> This makes it difficult to correlate specific Kafka fetch or poll operations 
> with the corresponding Spark tasks during debugging and performance 
> investigations—especially when multiple tasks consume from the same Kafka 
> topic partitions concurrently on different executors.
> Adding the Spark {{TaskID}} To the Kafka consumer, log statements would 
> significantly improve traceability and observability. It would allow 
> engineers to:
>  * Identify which Spark task triggered a specific Kafka poll or fetch
>  * Diagnose task-level Kafka latency or deserialization bottlenecks
>  * Simplify debugging of offset commit or consumer lag anomalies when 
> multiple concurrent consumers are running
> h3. *Current Behavior*
> The current Kafka source consumer logging framework does not include 
> {{TaskID}} information in the {{KafkaDataConsumer}} class.
> *Example log output:*
> {code:java}
> 14:30:16 INFO Executor: Running task 83.0 in stage 32.0 (TID 4188)
> 14:30:16 INFO KafkaBatchReaderFactoryWithRowBytesAccumulator: Creating Kafka 
> reader... taskId=4188 partitionId=83
> 14:35:19 INFO KafkaDataConsumer: From Kafka topicPartition=******* 
> groupId=******* read 12922 records through 80 polls (polled out 12979 
> records), taking 299911.706198 ms, during time span of 303617.473973 ms  <-- 
> No Task Context Here
> 14:35:19 INFO Executor: Finished task 83.0 in stage 32.0 (TID 4188){code}
> Although the Executor logs include the task context (TID), the 
> KafkaDataConsumer logs do not, making it difficult to correlate Kafka read 
> durations or latency with the specific Spark tasks that executed them. This 
> limits visibility into which part of a job spent the most time consuming from 
> Kafka.
> h3. *Impact*
> The absence of task context in {{KafkaDataConsumer}} logs creates several 
> challenges:
>  * *Performance Debugging:* Unable to pinpoint which specific Spark tasks 
> experience slow Kafka consumption.
>  * *Support/Troubleshooting:* Difficult to correlate customer incidents or 
> partition-level delays with specific task executions.
>  * *Metrics Analysis:* Cannot easily aggregate Kafka consumption metrics per 
> task for performance profiling or benchmarking.
>  * *Log Analysis:* Requires complex log correlation or pattern matching 
> across multiple executors to map Kafka operations to specific tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to