[jira] [Updated] (HUDI-7943) Resolve version conflict of fasterxml on spark3.2

2024-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7943:
-
Labels: pull-request-available  (was: )

> Resolve version conflict of fasterxml on spark3.2 
> --
>
> Key: HUDI-7943
> URL: https://issues.apache.org/jira/browse/HUDI-7943
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
>
> When run streaming read on spark3.2, raise exception that requires correct 
> version of jackson databind.
> Spark versions except 3.2 seem to use versions related to Spark dependencies.
>  
> version refer: https://github.com/apache/spark/blob/v3.2.3/pom.xml#L170
>  
> example code:
>  
> {code:java}
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.common.table.HoodieTableConfig._
> import org.apache.hudi.config.HoodieWriteConfig._
> import org.apache.hudi.keygen.constant.KeyGeneratorOptions._
> import org.apache.hudi.common.model.HoodieRecord
> import spark.implicits._
> val basePath = "hdfs:///tmp/trips_table"
> spark.readStream
> .format("hudi")
> .option("hoodie.datasource.query.type", "incremental")
> .option("hoodie.datasource.query.incremental.format", "cdc")
> .load(basePath)
> .writeStream
> .format("console")
> .option("checkpointLocation", "/tmp/trips_table_checkpoint")
> .outputMode("append")
> .start().awaitTermination()
> {code}
>  
>  
> error log:
>  
> {code:java}
> Caused by: java.lang.ExceptionInInitializerError: 
> com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 
> requires Jackson Databind version >= 2.10.0 and < 2.11.0
>   at 
> org.apache.spark.sql.hudi.streaming.HoodieSourceOffset.(HoodieSourceOffset.scala:30)
>   at 
> org.apache.spark.sql.hudi.streaming.HoodieStreamSource.getLatestOffset(HoodieStreamSource.scala:127)
>   at 
> org.apache.spark.sql.hudi.streaming.HoodieStreamSource.getOffset(HoodieStreamSource.scala:138)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$6(MicroBatchExecution.scala:403)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:69)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$2(MicroBatchExecution.scala:402)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$1(MicroBatchExecution.scala:384)
>   at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:627)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.constructNextBatch(MicroBatchExecution.scala:380)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:210)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:69)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:193)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExec

[jira] [Updated] (HUDI-7943) Resolve version conflict of fasterxml on spark3.2

2024-07-01 Thread Jihwan Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihwan Lee updated HUDI-7943:
-
Summary: Resolve version conflict of fasterxml on spark3.2   (was: Fix that 
resolve version conflict of fasterxml on spark3.2 )

> Resolve version conflict of fasterxml on spark3.2 
> --
>
> Key: HUDI-7943
> URL: https://issues.apache.org/jira/browse/HUDI-7943
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Priority: Major
>
> When run streaming read on spark3.2, raise exception that requires correct 
> version of jackson databind.
> Spark versions except 3.2 seem to use versions related to Spark dependencies.
>  
> version refer: https://github.com/apache/spark/blob/v3.2.3/pom.xml#L170
>  
> example code:
>  
> {code:java}
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.common.table.HoodieTableConfig._
> import org.apache.hudi.config.HoodieWriteConfig._
> import org.apache.hudi.keygen.constant.KeyGeneratorOptions._
> import org.apache.hudi.common.model.HoodieRecord
> import spark.implicits._
> val basePath = "hdfs:///tmp/trips_table"
> spark.readStream
> .format("hudi")
> .option("hoodie.datasource.query.type", "incremental")
> .option("hoodie.datasource.query.incremental.format", "cdc")
> .load(basePath)
> .writeStream
> .format("console")
> .option("checkpointLocation", "/tmp/trips_table_checkpoint")
> .outputMode("append")
> .start().awaitTermination()
> {code}
>  
>  
> error log:
>  
> {code:java}
> Caused by: java.lang.ExceptionInInitializerError: 
> com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 
> requires Jackson Databind version >= 2.10.0 and < 2.11.0
>   at 
> org.apache.spark.sql.hudi.streaming.HoodieSourceOffset.(HoodieSourceOffset.scala:30)
>   at 
> org.apache.spark.sql.hudi.streaming.HoodieStreamSource.getLatestOffset(HoodieStreamSource.scala:127)
>   at 
> org.apache.spark.sql.hudi.streaming.HoodieStreamSource.getOffset(HoodieStreamSource.scala:138)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$6(MicroBatchExecution.scala:403)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:69)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$2(MicroBatchExecution.scala:402)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$1(MicroBatchExecution.scala:384)
>   at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:627)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.constructNextBatch(MicroBatchExecution.scala:380)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:210)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:69)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:193)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecutio