devanshguptatrepp opened a new issue, #7261: URL: https://github.com/apache/hudi/issues/7261
**Describe the problem you faced** When trying to write data to hudi, my spark application fails with following error `java.lang.Exception: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at com.trepp.zone.ZoneExecutionHelper.upsert(ZoneExecutionHelper.scala:101) at com.trepp.zone.Presentation.$anonfun$writeHudiObject$1(Presentation.scala:92) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.trepp.zone.Presentation.writeHudiObject(Presentation.scala:81)` I am reading data from an Amazon S3 bucket and doing some transformation before writing data to hudi. And I am using hudi-spark3-bundle_2.12 available on maven central for Scala 12 **Expected behavior** Successfully able to write to Hudi location **Environment Description** Scala version : 2.12.17 Hudi version : 0.12.0 Spark version : 3.3.0 Hadoop version : 2.8.1 Storage (HDFS/S3/GCS..) : S3 Running on Docker? (yes/no) : no Running on AWS EMR version: 6.8.0 **Additional context** Spark submit command: 1. `spark-submit --deploy-mode client --jars s3a://bucketName/binaries/hudi/etl/hudi-spark3-bundle_2.12-0.12.0.jar --class com.xyz.EntryClass --master yarn --conf spark.files.maxPartitionBytes=268435456 --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf spark.dynamicAllocation.enabled=true --conf hoodie.meta.sync.enable=false s3://bucketName/application-1.0-SNAPSHOT-jar-with-dependencies.jar`: Reason for using spark-bundle_2.12-0.12.0 is because 0.12.x supports 3.3.x scala version as per [here](https://hudi.apache.org/docs/quick-start-guide/#setup) 2. `spark-submit --deploy-mode client --jars s3a://bucketName/binaries/hudi/etl/hudi-spark-bundle_2.12-0.11.1.jar --class com.xyz.EntryClass --master yarn --conf spark.files.maxPartitionBytes=268435456 --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf spark.dynamicAllocation.enabled=true --conf hoodie.meta.sync.enable=false s3://bucketName/application-1.0-SNAPSHOT-jar-with-dependencies.jar`: Reason for using spark-bundle_2.12-0.11.1 is because AWS EMR 6.8.0 supports hudi 0.11.1 as per [here](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-680-release.html) In my pom.xml, I have the following dependencie for hudi-spark bundle: ` <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-spark3-bundle_2.12</artifactId> <version>0.11.1</version> </dependency> ` **Stacktrace** 1. Error with spark-submit command number 1 `java.lang.Exception: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at com.trepp.zone.ZoneExecutionHelper.upsert(ZoneExecutionHelper.scala:101) at com.trepp.zone.Presentation.$anonfun$writeHudiObject$1(Presentation.scala:92) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.trepp.zone.Presentation.writeHudiObject(Presentation.scala:81) at com.trepp.process.Executor.$anonfun$writeObject$2(Executor.scala:136) at com.trepp.process.Executor.$anonfun$writeObject$2$adapted(Executor.scala:133) at scala.collection.immutable.List.foreach(List.scala:431) at com.trepp.process.Executor.$anonfun$writeObject$1(Executor.scala:133) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.trepp.process.Executor.writeObject(Executor.scala:133) at com.trepp.process.Executor$$anon$2.$anonfun$accept$2(Executor.scala:118) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.trepp.process.Executor$$anon$2.accept(Executor.scala:113) at com.trepp.process.Executor$$anon$2.accept(Executor.scala:111) at java.util.TreeMap.forEach(TreeMap.java:1005) at com.trepp.process.Executor.executeQuery(Executor.scala:111) at com.trepp.dataload.EtlImpl.$anonfun$executeProcess$3(EtlImpl.scala:43) at scala.util.Try$.apply(Try.scala:213) at com.trepp.dataload.EtlImpl.$anonfun$executeProcess$1(EtlImpl.scala:37) at com.trepp.dataload.EtlImpl.$anonfun$executeProcess$1$adapted(EtlImpl.scala:23) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at com.trepp.dataload.EtlImpl.executeProcess(EtlImpl.scala:23) at com.trepp.TreppClient$.$anonfun$main$1(TreppClient.scala:46) at scala.util.Try$.apply(Try.scala:213) at com.trepp.TreppClient$.main(TreppClient.scala:40) at com.trepp.TreppClient.main(TreppClient.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1006) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1095) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1104) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)` 2. Error with spark-submit command number 2 `2022-11-21T06:59:16,079 ERROR scheduler.TaskSetManager: Task 0 in stage 22.0 failed 4 times; aborting job java.lang.Exception: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 297) (ip-10-73-96-169.ec2.internal executor 2): java.io.InvalidClassException: org.apache.hudi.data.HoodieJavaRDD; local class incompatible: stream classdesc serialVersionUID = -7482894453477434543, local class serialVersionUID = -9174748892468899293 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1815) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1640) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:527) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2322) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:527) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2322) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:138) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org