[I] [Bug] [seatunnel-formats][seatunnel-format-json] The process of collecting 'SeaTunnelRow' may lead to NPE [seatunnel]

via GitHub Wed, 19 Nov 2025 19:07:36 -0800


LiJie20190102 opened a new issue, #10089:
URL: https://github.com/apache/seatunnel/issues/10089


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   When I use HTTP as the source and the 'content_field' parameter, it 
generates NPE
   
   ### SeaTunnel Version
   
   2.3.11
   
   ### SeaTunnel Config
   
   ```conf
   env {
     # You can set spark configuration here
     # see available properties defined by spark: 
https://spark.apache.org/docs/latest/configuration.html#available-properties
     #job.mode = BATCH
     job.name = "SeaTunnel"
     spark.executor.instances = 1
     spark.executor.cores = 1
     spark.executor.memory = "1g"
     spark.master = "local[2]"
   }
   source {
     Http {
       url = 
"http://localhost:30810/datastudio-flow/api/v1/project-spaces/1952551147977969664/single-tasks/1964983360845647872/detail";
       method = "GET"
       format = "json"
       schema = {
         fields {
           id = long
           createdBy = string
         }
       }
       headers={
         user_id="CIDC-U-5e15075d98684cc88a9a4b30aa72ddcf"
       }
       content_field = "$.data.*"
        plugin_output = "http"
     }
   }
   
   transform {
     # split data by specific delimiter
   
     # you can also use other transform plugins, such as sql
     sql {
       plugin_input = "http"
       query = "select * from dual"
       plugin_output = "console"
     }
   
     # If you would like to get more information about how to configure 
seatunnel and see full list of transform plugins,
     # please go to https://seatunnel.apache.org/docs/category/transform-v2
   }
   
   # 控制台打印读取的 Http 数据
   sink {
     Console {
     plugin_input = "console"
       parallelism = 1
     }
   }
   ```
   
   ### Running Command
   
   ```shell
   bin/start-seatunnel-spark-3-connector-v2.sh \
   --master local[2] \
   --deploy-mode client \
   --config ./config/fttp2console.config
   ```
   
   ### Error Exception
   
   ```log
   Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
        at scala.Option.foreach(Option.scala:407)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:377)
        ... 38 more
   Caused by: java.lang.RuntimeException: java.lang.NullPointerException
        at 
org.apache.seatunnel.translation.spark.source.partition.batch.SeaTunnelBatchPartitionReader.next(SeaTunnelBatchPartitionReader.java:38)
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119)
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:156)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
        at scala.Option.exists(Option.scala:376)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:97)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:435)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:480)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:381)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NullPointerException
        at 
org.apache.seatunnel.format.json.JsonDeserializationSchema.setCollectorTablePath(JsonDeserializationSchema.java:162)
        at 
org.apache.seatunnel.format.json.JsonDeserializationSchema.collect(JsonDeserializationSchema.java:148)
        at 
org.apache.seatunnel.connectors.seatunnel.http.source.DeserializationCollector.collect(DeserializationCollector.java:36)
        at 
org.apache.seatunnel.connectors.seatunnel.http.source.HttpSourceReader.collect(HttpSourceReader.java:419)
        at 
org.apache.seatunnel.connectors.seatunnel.http.source.HttpSourceReader.pollAndCollectData(HttpSourceReader.java:137)
        at 
org.apache.seatunnel.connectors.seatunnel.http.source.HttpSourceReader.internalPollNext(HttpSourceReader.java:362)
        at 
org.apache.seatunnel.connectors.seatunnel.http.source.HttpSourceReader.pollNext(HttpSourceReader.java:330)
        at 
org.apache.seatunnel.translation.source.ParallelSource.run(ParallelSource.java:144)
   ```
   
   ### Zeta or Flink or Spark Version
   
   spark
   
   ### Java or Scala Version
   
   java8
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] [seatunnel-formats][seatunnel-format-json] The process of collecting 'SeaTunnelRow' may lead to NPE [seatunnel]

Reply via email to