Can you open an issue and we will look into this there. To confirm the theory, can you enable INFO logging and paste the output with the line:
"Registered avro schema : ..." Can you also print the schema using inputDF.printSchema() Thanks, Balaji.V On Fri, Aug 21, 2020 at 12:53 PM selvaraj periyasamy < [email protected]> wrote: > Thanks Balaji. > > could you please provide more info on how to get it done and pass it to > hudi? > > Thanks, > Selva > > On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan > <[email protected]> wrote: > > > Hi Selvaraj, > > Even though the incoming batch has non null values for the new column, > > existing data do not have this column. So, you need to make sure the avro > > schema has the new column to be nullable and be backwards compatible. > > Balaji.V > > On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy < > > [email protected]> wrote: > > > > Hi, > > > > with 0.5.0 version of Hudi, I am using COW table type, which is > > partitioned by yyyymmdd format . We already have a table with > Array<String> > > type columns and data populated. And then we are now trying to add a new > > column ("rule_profile_id_list") in dataframes and while trying to write , > > getting below exception the below error message. I am making sure that > > DataFrame that I pass is having non null value as it is a non-nullable > > column as per schema definition in dataframe. I don't use "--conf > > spark.sql.hive.convertMetastoreParquet=false" because I am already > setting > > below code snippet handled in my code. > > > > > > > sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", > > classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], > > classOf[org.apache.hadoop.fs.PathFilter]); > > > > > > Could someone help me to resolve this error? > > > > 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID > > 31217, sl73caehdn0811.visa.com, executor 10): > > org.apache.hudi.exception.HoodieUpsertException: Error upserting > bucketType > > UPDATE for partition :8 > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) > > at > > > > > org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) > > at > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > at > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > at > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > > at > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > > at > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) > > at > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) > > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) > > at > > > > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) > > at > > > > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) > > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > > at org.apache.spark.scheduler.Task.run(Task.scala:109) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: org.apache.hudi.exception.HoodieException: > > org.apache.hudi.exception.HoodieException: > > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > > Null-value for required field: rule_profile_id_list > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257) > > ... 28 more > > Caused by: org.apache.hudi.exception.HoodieException: > > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > > Null-value for required field: rule_profile_id_list > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200) > > ... 30 more > > Caused by: java.util.concurrent.ExecutionException: > > java.lang.RuntimeException: Null-value for required field: > > rule_profile_id_list > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140) > > ... 31 more > > Caused by: java.lang.RuntimeException: Null-value for required field: > > rule_profile_id_list > > at > > > > > org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170) > > at > > org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142) > > at > > > > > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123) > > at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293) > > at > > > > > org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101) > > at org.apache.hudi.io > .HoodieMergeHandle.write(HoodieMergeHandle.java:288) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422) > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38) > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > ... 3 more > > > > 20/08/21 08:38:30 INFO TaskSetManager: Starting task 8.1 in stage 151.0 > > (TID 31238, sl73caehdn0615.visa.com, executor 100, partition 8, > > PROCESS_LOCAL, 7661 bytes) > > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 14.0 in stage 151.0 > > (TID 31223) in 1269 ms on sl73caehdn0615.visa.com (executor 100) (1/29) > > 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_9 in memory on > > sl73caehdn0709.visa.com:34428 (size: 379.0 B, free: 5.1 GB) > > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 9.0 in stage 151.0 > > (TID 31218) in 1663 ms on sl73caehdn0709.visa.com (executor 62) (2/29) > > 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_23 in memory on > > sl73caehdn0716.visa.com:45986 (size: 372.0 B, free: 5.1 GB) > > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 23.0 in stage 151.0 > > (TID 31232) in 1754 ms on sl73caehdn0716.visa.com (executor 99) (3/29) > > 20/08/21 08:38:31 INFO TaskSetManager: Lost task 8.1 in stage 151.0 (TID > > 31238) on sl73caehdn0615.visa.com, executor 100: > > org.apache.hudi.exception.HoodieUpsertException (Error upserting > bucketType > > UPDATE for partition :8) [duplicate 1] > > 20/08/21 08:38:31 INFO TaskSetManager: Starting task 8.2 in stage 151.0 > > (TID 31239, sl73caehdn0711.visa.com, executor 81, partition 8, > > PROCESS_LOCAL, 7661 bytes) > > 20/08/21 08:38:31 INFO BlockManagerInfo: Added broadcast_130_piece0 in > > memory on sl73caehdn0711.visa.com:43376 (size: 89.1 KB, free: 5.1 GB) > > 20/08/21 08:38:31 INFO MapOutputTrackerMasterEndpoint: Asked to send map > > output locations for shuffle 32 to 10.160.39.149:43212 > > 20/08/21 08:38:31 WARN TaskSetManager: Lost task 6.0 in stage 151.0 (TID > > 31215, sl73caehdn0423.visa.com, executor 48): > > org.apache.hudi.exception.HoodieUpsertException: Error upserting > bucketType > > UPDATE for partition :6 > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) > > at > > > > > org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) > > at > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > at > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > at > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > > at > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > > at > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) > > at > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) > > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) > > at > > > > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) > > at > > > > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) > > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > > at org.apache.spark.scheduler.Task.run(Task.scala:109) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: org.apache.hudi.exception.HoodieException: > > org.apache.hudi.exception.HoodieException: > > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > > Null-value for required field: rule_profile_id_list > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257) > > ... 28 more > > Caused by: org.apache.hudi.exception.HoodieException: > > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > > Null-value for required field: rule_profile_id_list > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200) > > ... 30 more > > Caused by: java.util.concurrent.ExecutionException: > > java.lang.RuntimeException: Null-value for required field: > > rule_profile_id_list > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140) > > ... 31 more > > Caused by: java.lang.RuntimeException: Null-value for required field: > > rule_profile_id_list > > at > > > > > org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170) > > at > > org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142) > > at > > > > > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123) > > at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293) > > at > > > > > org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101) > > at org.apache.hudi.io > .HoodieMergeHandle.write(HoodieMergeHandle.java:288) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422) > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38) > > at > > > > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > ... 3 more > > >
