Madan16 opened a new issue, #8428: URL: https://github.com/apache/hudi/issues/8428
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Yes - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced**: We are trying to upsert in non-partioned table. so far code was working fine(ran for almost 2 months once in a day) for every upsert but all of a sudden it started failing with below reasons: 1) An error occurred while calling o167.save. org/apache/spark/sql/avro/SchemaConverters$. 2) An error occurred while calling o168.save. Failed to upsert for commit time 20230410133751. A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. Using below code to perform the upsert: print('Writing to unpartitioned Hudi table.') combinedConf = {**commonConfig, **unpartitionDataConfig, **incrementalConfig} outputDf.write.format('org.apache.hudi').options(**combinedConf).mode('Append').save(targetPath) 2. configuration details: commonConfig = {'className' : 'org.apache.hudi', 'hoodie.datasource.hive_sync.use_jdbc':'false', 'hoodie.datasource.write.precombine.field': 'ingest_dt', 'hoodie.datasource.write.recordkey.field': primaryKey, 'hoodie.table.name': tableName, 'hoodie.consistency.check.enabled': 'true', 'hoodie.datasource.hive_sync.database': dbName, 'hoodie.datasource.hive_sync.table': tableName, 'hoodie.datasource.hive_sync.enable': 'true'} unpartitionDataConfig = {'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.NonPartitionedExtractor', 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.NonpartitionedKeyGenerator'} incrementalConfig = {'hoodie.upsert.shuffle.parallelism': 20, 'hoodie.datasource.write.operation': 'upsert', 'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS', 'hoodie.cleaner.commits.retained': 10} 3. targetPath: s3 bucket **Expected behavior** upsert should have happened as it was running fine untill above error started showing up. **Environment Description** * Hudi version : Apache Hudi Connector version 3.0_hudi_0.9.0_glue_3.0 AWS Glue version : Glue 3.0 * Spark version : 3.1 * Python version : 3 * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 (both source and target). * File format: parquet using snappy compression. * Running on Docker? (yes/no) : No **Additional context** Add any other context about the problem here. **column and data type of source:** |-- pk_ABC_ky: string |-- A: int |-- B: long |-- C: date |-- D: date |-- E: string |-- op: string |-- source_name: string |-- source_schema: string |-- source_table: string |-- ingest_dt: string **column and data type of target:** pk_ABC_ky:string A:int B:bigint C:date D:date E:string op:varchar(1) source_name: varchar(24) source_schema:varchar(24) source_table:varchar(13) ingest_dt:string **Stacktrace** ```**stack trace for this error:An error occurred while calling o168.save. org/apache/spark/sql/avro/SchemaConverters$:** 23/04/11 13:39:26 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] { "Event": "GlueETLJobExceptionEvent", "Timestamp": 1681220366473, "Failure Reason": "Traceback (most recent call last):\n File \"/tmp/TEST_QA_Hudi.py\", line 216, in <module>\n outputDf.write.format('org.apache.hudi').options(**combinedConf).mode('Append').save(targetPath)\n File \"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py\", line 1109, in save\n self._jwrite.save(path)\n File \"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py\", line 1305, in __call__\n answer, self.gateway_client, self.target_id, self.name)\n File \"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 111, in deco\n return f(*a, **kw)\n File \"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py\", line 328, in get_return_value\n format(target_id, \".\", name), value)\npy4j.protocol.Py4JJavaError: An error occurred while calling o168.save.\n: java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters$\n\tat org.apache.hudi.AvroConversionUtils$.convertStructTypeTo AvroSchema(AvroConversionUtils.scala:63)\n\tat org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:216)\n\tat org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)\n\tat org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)\n\tat org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)\n\tat org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)\n\tat org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)\n\tat org.apache.spark.sql.exec ution.SparkPlan.execute(SparkPlan.scala:181)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)\n\tat org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)\n\tat org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)\n\tat org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)\n\tat org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)\n\t at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)\n\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)\n\tat org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)\n\tat org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)\n\tat org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)\n\tat org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)\n\tat org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4 j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\nCaused by: java.lang.ClassNotFoundException: org.apache.spark.sql.avro.SchemaConverters$\n\tat java.net.URLClassLoader.findClass(URLClassLoader.java:387)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:418)\n\tat sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:351)\n\t... 41 more\n", "Stack Trace": [ { "Declaring Class": "get_return_value", "Method Name": "format(target_id, \".\", name), value)", "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", "Line Number": 328 }, { "Declaring Class": "deco", "Method Name": "return f(*a, **kw)", "File Name": "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", "Line Number": 111 }, { "Declaring Class": "__call__", "Method Name": "answer, self.gateway_client, self.target_id, self.name)", "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", "Line Number": 1305 }, { "Declaring Class": "save", "Method Name": "self._jwrite.save(path)", "File Name": "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", "Line Number": 1109 }, { "Declaring Class": "<module>", "Method Name": "outputDf.write.format('org.apache.hudi').options(**combinedConf).mode('Append').save(targetPath)", "File Name": "/tmp/TEST_QA_Hudi.py", "Line Number": 216 } ], "Last Executed Line number": 216, "script": "TEST_QA_Hudi.py" } **stack trace for this error:An error occurred while calling o168.save. Failed to upsert for commit time 20230410133751.** 23/04/10 13:37:56 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] { "Event": "GlueExceptionAnalysisStageFailed", "Timestamp": 1681133876157, "Failure Reason": "Job aborted due to stage failure: Task 31 in stage 18.0 failed 4 times, most recent failure: Lost task 31.3 in stage 18.0 (TID 352) (172.34.102.9 executor 7): java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/IncompatibleSchemaException", "Stack Trace": [ { "Declaring Class": "org.apache.hudi.HoodieSparkUtils$", "Method Name": "$anonfun$createRddInternal$2", "File Name": "HoodieSparkUtils.scala", "Line Number": 137 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "$anonfun$mapPartitions$2", "File Name": "RDD.scala", "Line Number": 863 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "$anonfun$mapPartitions$2$adapted", "File Name": "RDD.scala", "Line Number": 863 }, { "Declaring Class": "org.apache.spark.rdd.MapPartitionsRDD", "Method Name": "compute", "File Name": "MapPartitionsRDD.scala", "Line Number": 52 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "computeOrReadCheckpoint", "File Name": "RDD.scala", "Line Number": 373 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "iterator", "File Name": "RDD.scala", "Line Number": 337 }, { "Declaring Class": "org.apache.spark.rdd.MapPartitionsRDD", "Method Name": "compute", "File Name": "MapPartitionsRDD.scala", "Line Number": 52 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "computeOrReadCheckpoint", "File Name": "RDD.scala", "Line Number": 373 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "iterator", "File Name": "RDD.scala", "Line Number": 337 }, { "Declaring Class": "org.apache.spark.rdd.MapPartitionsRDD", "Method Name": "compute", "File Name": "MapPartitionsRDD.scala", "Line Number": 52 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "computeOrReadCheckpoint", "File Name": "RDD.scala", "Line Number": 373 }, { "Declaring Class": "org.apache.spark.rdd.RDD", "Method Name": "iterator", "File Name": "RDD.scala", "Line Number": 337 }, { "Declaring Class": "org.apache.spark.shuffle.ShuffleWriteProcessor", "Method Name": "write", "File Name": "ShuffleWriteProcessor.scala", "Line Number": 59 }, { "Declaring Class": "org.apache.spark.scheduler.ShuffleMapTask", "Method Name": "runTask", "File Name": "ShuffleMapTask.scala", "Line Number": 99 }, { "Declaring Class": "org.apache.spark.scheduler.ShuffleMapTask", "Method Name": "runTask", "File Name": "ShuffleMapTask.scala", "Line Number": 52 }, { "Declaring Class": "org.apache.spark.scheduler.Task", "Method Name": "run", "File Name": "Task.scala", "Line Number": 131 }, { "Declaring Class": "org.apache.spark.executor.Executor$TaskRunner", "Method Name": "$anonfun$run$3", "File Name": "Executor.scala", "Line Number": 497 }, { "Declaring Class": "org.apache.spark.util.Utils$", "Method Name": "tryWithSafeFinally", "File Name": "Utils.scala", "Line Number": 1439 }, { "Declaring Class": "org.apache.spark.executor.Executor$TaskRunner", "Method Name": "run", "File Name": "Executor.scala", "Line Number": 500 }, { "Declaring Class": "java.util.concurrent.ThreadPoolExecutor", "Method Name": "runWorker", "File Name": "ThreadPoolExecutor.java", "Line Number": 1149 }, { "Declaring Class": "java.util.concurrent.ThreadPoolExecutor$Worker", "Method Name": "run", "File Name": "ThreadPoolExecutor.java", "Line Number": 624 }, { "Declaring Class": "java.lang.Thread", "Method Name": "run", "File Name": "Thread.java", "Line Number": 750 }, { "Declaring Class": " java.lang.ClassNotFoundException: org.apache.spark.sql.avro.IncompatibleSchemaException", "Method Name": "CausedBy", "File Name": "CausedBy", "Line Number": -1 }, { "Declaring Class": "java.net.URLClassLoader", "Method Name": "findClass", "File Name": "URLClassLoader.java", "Line Number": 387 }, { "Declaring Class": "java.lang.ClassLoader", "Method Name": "loadClass", "File Name": "ClassLoader.java", "Line Number": 418 }, { "Declaring Class": "sun.misc.Launcher$AppClassLoader", "Method Name": "loadClass", "File Name": "Launcher.java", "Line Number": 352 }, { "Declaring Class": "java.lang.ClassLoader", "Method Name": "loadClass", "File Name": "ClassLoader.java", "Line Number": 351 } ], "Stage ID": 18, "Stage Attempt ID": 0, "Number of Tasks": 40 } AND 23/04/10 13:37:56 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] { "Event": "GlueETLJobExceptionEvent", "Timestamp": 1681133876669, "Failure Reason": "Traceback (most recent call last):\n File \"/tmp/TEST_QA_Hudi.py\", line 216, in <module>\n outputDf.write.format('org.apache.hudi').options(**combinedConf).mode('Append').save(targetPath)\n File \"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py\", line 1109, in save\n self._jwrite.save(path)\n File \"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py\", line 1305, in __call__\n answer, self.gateway_client, self.target_id, self.name)\n File \"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 111, in deco\n return f(*a, **kw)\n File \"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py\", line 328, in get_return_value\n format(target_id, \".\", name), value)\npy4j.protocol.Py4JJavaError: An error occurred while calling o168.save.\n: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20230410133751\n\tat org.apache.hudi.table.action.commit .AbstractWriteHelper.write(AbstractWriteHelper.java:62)\n\tat org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46)\n\tat org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:98)\n\tat org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:88)\n\tat org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)\n\tat org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214)\n\tat org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:265)\n\tat org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)\n\tat org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)\n\tat org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)\n\tat org.apache.spark.sql.execution.command.ExecutedCommandExec.s ideEffectResult(commands.scala:68)\n\tat org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)\n\tat org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)\n\tat org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)\n\tat org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecutio n.scala:232)\n\tat org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)\n\tat org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)\n\tat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)\n\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)\n\tat org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)\n\tat org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)\n\tat org.apache.spark.sql.DataFrameWriter.s aveToV1Source(DataFrameWriter.scala:438)\n\tat org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)\n\tat org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\nCaused by: org.apache.spark.SparkException: Job aborted due to stage failur e: Task 31 in stage 18.0 failed 4 times, most recent failure: Lost task 31.3 in stage 18.0 (TID 352) (172.34.102.9 executor 7): java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/IncompatibleSchemaException\n\tat org.apache.hudi.HoodieSparkUtils$.$anonfun$createRddInternal$2(HoodieSparkUtils.scala:137)\n\tat org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)\n\tat org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)\n\tat org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:337)\n\tat org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:337)\n\tat org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheck point(RDD.scala:373)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:337)\n\tat org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:131)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\nCaused by: java.lang.ClassNotFoundException: org.apache.spark.sql.avro.IncompatibleSchemaException\n\tat java.net.URLClassLoader.findClass(URLClassLoader.java :387)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:418)\n\tat sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:351)\n\t... 22 more\n\nDriver stacktrace:\n\tat org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2465)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2414)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2413)\n\tat scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)\n\tat scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)\n\tat scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)\n\tat org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2413)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$ha ndleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)\n\tat scala.Option.foreach(Option.scala:257)\n\tat org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2679)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610)\n\tat org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:914)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2278)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)\n\tat org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)\n\tat org.apache.spark.rdd.RDD OperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:414)\n\tat org.apache.spark.rdd.RDD.collect(RDD.scala:1029)\n\tat org.apache.spark.rdd.PairRDDFunctions.$anonfun$countByKey$1(PairRDDFunctions.scala:366)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:414)\n\tat org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:366)\n\tat org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:314)\n\tat org.apache.hudi.index.bloom.SparkHoodieBloomIndex.lookupIndex(SparkHoodieBloomIndex.java:114)\n\tat org.apache.hudi.index.bloom.SparkHoodieBloomIndex.tagLocation(SparkHoodieBloomIndex.java:84)\n\tat org.apache.hudi.index.bloom.SparkHoodieBloomIndex.tagLocatio n(SparkHoodieBloomIndex.java:60)\n\tat org.apache.hudi.table.action.commit.AbstractWriteHelper.tag(AbstractWriteHelper.java:69)\n\tat org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:51)\n\t... 45 more\nCaused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/IncompatibleSchemaException\n\tat org.apache.hudi.HoodieSparkUtils$.$anonfun$createRddInternal$2(HoodieSparkUtils.scala:137)\n\tat org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)\n\tat org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)\n\tat org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:337)\n\tat org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:337)\n\ta t org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)\n\tat org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)\n\tat org.apache.spark.rdd.RDD.iterator(RDD.scala:337)\n\tat org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:131)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\t... 1 more\nCaused by: java.lang.ClassNotFoundException: org.apache.spark.sq l.avro.IncompatibleSchemaException\n\tat java.net.URLClassLoader.findClass(URLClassLoader.java:387)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:418)\n\tat sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:351)\n\t... 22 more\n", "Stack Trace": [ { "Declaring Class": "get_return_value", "Method Name": "format(target_id, \".\", name), value)", "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", "Line Number": 328 }, { "Declaring Class": "deco", "Method Name": "return f(*a, **kw)", "File Name": "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", "Line Number": 111 }, { "Declaring Class": "__call__", "Method Name": "answer, self.gateway_client, self.target_id, self.name)", "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", "Line Number": 1305 }, { "Declaring Class": "save", "Method Name": "self._jwrite.save(path)", "File Name": "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", "Line Number": 1109 }, { "Declaring Class": "<module>", "Method Name": "outputDf.write.format('org.apache.hudi').options(**combinedConf).mode('Append').save(targetPath)", "File Name": "/tmp/TEST_QA_Hudi.py", "Line Number": 216 } ], "Last Executed Line number": 216, "script": "TEST_QA_Hudi.py" } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org