[
https://issues.apache.org/jira/browse/SPARK-55271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062540#comment-18062540
]
Charl P. Botha edited comment on SPARK-55271 at 3/3/26 6:52 PM:
----------------------------------------------------------------
We ran into this bug in production after upgrading our streaming pipelines from
Spark 3.5.x to 4.1.x
With Claude's help, we independently implemented the same fix as in the linked
PR, with the only difference that we changed `Some(...)` to `Option(...)` in
the RTM path as well – not sure if it's required there though.
We built the patched spark-sql-kafka-0-10 jar and patched it into the relevant
pipelines, which are now running without the NPE.
I came here tonight to report the issue, but then happily discovered this
existing report. Would be really great to see this work reviewed and merged!
was (Author: JIRAUSER312610):
We ran into this bug in production after upgrading our streaming pipelines from
Spark 3.5.x to 4.1.x
With Claude's help, we implemented the same fix as in the linked PR, with the
only difference that we changed `Some(...)` to `Option(...)` in the RTM path as
well – not sure if it's required there though.
We built the patched spark-sql-kafka-0-10 jar and patched it into the relevant
pipelines, which are now running without the NPE.
Would be great to see this work reviewed and merged!
> NullPointerException in Kafka Micro-Batch Streaming Progress Reporting
> ----------------------------------------------------------------------
>
> Key: SPARK-55271
> URL: https://issues.apache.org/jira/browse/SPARK-55271
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 4.1.1
> Reporter: Dipesh Chandra Maurya
> Priority: Critical
> Labels: pull-request-available
>
> h3. Root Cause
> The error originates in {{KafkaMicroBatchStream.metrics()}} at line 520,
> where the code attempts to call {{.get()}} on a Scala {{Option}} that
> contains {{{}null{}}}, then immediately invokes {{.map()}} on the null
> result. This violates Scala's Option contract and causes the
> NullPointerException.
> h3. Error Flow
> # The streaming query completes a micro-batch trigger
> # Progress reporting attempts to extract source metrics via
> {{ProgressContext.extractSourceProgress()}}
> # The KafkaMicroBatchStream is queried for its metrics
> # An unsafe {{.get()}} call on an empty or null Option returns null
> # Attempting to call {{.map()}} on the null value throws NullPointerException
> h3. Impact
> * The streaming query crashes and stops processing
> * Progress information cannot be generated for the current batch
> * The failure occurs during the finishTrigger phase, after batch processing
> completes
> h3. Technical Details
> * {*}Component{*}: Spark Structured Streaming with Kafka source
> * {*}Class{*}: {{org.apache.spark.sql.kafka010.KafkaMicroBatchStream}}
> * {*}Method{*}: {{metrics()}} at line 520
> * {*}Trigger Type{*}: ProcessingTime (micro-batch mode)
> h3. Likely Causes
> * Missing or unavailable Kafka metrics configuration
> * Empty/uninitialized metrics collection in the Kafka source
> * Race condition where metrics are accessed before initialization
> * Incompatibility between Spark and Kafka connector versions
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]