brandon-stanley edited a comment on issue #1960:
URL: https://github.com/apache/hudi/issues/1960#issuecomment-673462785


   @bhasudha Thanks for the response. Does the precombine field have to be a 
non-nullable field/column as well? My dataset may have duplicates but I have 
implemented custom logic to deduplicate since there are two columns within my 
dataset that are used to determine which is the latest record: 
COALESCE(update_date, create_date). I implemented it this way because it is an 
[SCD type 2 
table.](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row)
   
   Also, how would I specify the payload class that would ignore the precombine 
field? I receive the following error when specifying the 
`hoodie.datasource.write.payload.class` configuration property as 
`org.apache.hudi.common.model.HoodieAvroPayload`:
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o152.save.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 
in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in stage 21.0 
(TID 529, localhost, executor driver): java.io.IOException: Could not create 
payload for class: org.apache.hudi.common.model.HoodieAvroPayload
           at 
org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at 
org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class
           at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at 
org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: 
org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord,
 java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   
   Driver stacktrace:
           at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
           at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
           at scala.Option.foreach(Option.scala:245)
           at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
           at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1364)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
           at 
org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
           at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1471)
           at 
org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
           at 
org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
           at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:142)
           at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
           at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
           at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
           at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
           at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
           at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
           at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Could not create payload for class: 
org.apache.hudi.common.model.HoodieAvroPayload
           at 
org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:128)
           at 
org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:181)
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:103)
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:347)
           at scala.collection.Iterator$class.foreach(Iterator.scala:743)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
           at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:296)
           at scala.collection.AbstractIterator.to(Iterator.scala:1174)
           at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:288)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1174)
           at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:275)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1174)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1364)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:121)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           ... 1 more
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class
           at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at 
org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
           ... 28 more
   Caused by: java.lang.NoSuchMethodException: 
org.apache.hudi.common.model.HoodieAvroPayload.<init>(org.apache.avro.generic.GenericRecord,
 java.lang.Comparable)
           at java.lang.Class.getConstructor0(Class.java:3082)
           at java.lang.Class.getConstructor(Class.java:1825)
           at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 29 more
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to