Thanks Balaji. could you please provide more info on how to get it done and pass it to hudi?
Thanks, Selva On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan <v.bal...@ymail.com.invalid> wrote: > Hi Selvaraj, > Even though the incoming batch has non null values for the new column, > existing data do not have this column. So, you need to make sure the avro > schema has the new column to be nullable and be backwards compatible. > Balaji.V > On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy < > selvaraj.periyasamy1...@gmail.com> wrote: > > Hi, > > with 0.5.0 version of Hudi, I am using COW table type, which is > partitioned by yyyymmdd format . We already have a table with Array<String> > type columns and data populated. And then we are now trying to add a new > column ("rule_profile_id_list") in dataframes and while trying to write , > getting below exception the below error message. I am making sure that > DataFrame that I pass is having non null value as it is a non-nullable > column as per schema definition in dataframe. I don't use "--conf > spark.sql.hive.convertMetastoreParquet=false" because I am already setting > below code snippet handled in my code. > > > sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", > classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], > classOf[org.apache.hadoop.fs.PathFilter]); > > > Could someone help me to resolve this error? > > 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID > 31217, sl73caehdn0811.visa.com, executor 10): > org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType > UPDATE for partition :8 > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) > at > > org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) > at > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > at > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > at > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) > at > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) > at > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) > at > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Null-value for required field: rule_profile_id_list > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257) > ... 28 more > Caused by: org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Null-value for required field: rule_profile_id_list > at > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200) > ... 30 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Null-value for required field: > rule_profile_id_list > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140) > ... 31 more > Caused by: java.lang.RuntimeException: Null-value for required field: > rule_profile_id_list > at > > org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170) > at > org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142) > at > > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123) > at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293) > at > > org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101) > at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422) > at > > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38) > at > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > > 20/08/21 08:38:30 INFO TaskSetManager: Starting task 8.1 in stage 151.0 > (TID 31238, sl73caehdn0615.visa.com, executor 100, partition 8, > PROCESS_LOCAL, 7661 bytes) > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 14.0 in stage 151.0 > (TID 31223) in 1269 ms on sl73caehdn0615.visa.com (executor 100) (1/29) > 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_9 in memory on > sl73caehdn0709.visa.com:34428 (size: 379.0 B, free: 5.1 GB) > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 9.0 in stage 151.0 > (TID 31218) in 1663 ms on sl73caehdn0709.visa.com (executor 62) (2/29) > 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_23 in memory on > sl73caehdn0716.visa.com:45986 (size: 372.0 B, free: 5.1 GB) > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 23.0 in stage 151.0 > (TID 31232) in 1754 ms on sl73caehdn0716.visa.com (executor 99) (3/29) > 20/08/21 08:38:31 INFO TaskSetManager: Lost task 8.1 in stage 151.0 (TID > 31238) on sl73caehdn0615.visa.com, executor 100: > org.apache.hudi.exception.HoodieUpsertException (Error upserting bucketType > UPDATE for partition :8) [duplicate 1] > 20/08/21 08:38:31 INFO TaskSetManager: Starting task 8.2 in stage 151.0 > (TID 31239, sl73caehdn0711.visa.com, executor 81, partition 8, > PROCESS_LOCAL, 7661 bytes) > 20/08/21 08:38:31 INFO BlockManagerInfo: Added broadcast_130_piece0 in > memory on sl73caehdn0711.visa.com:43376 (size: 89.1 KB, free: 5.1 GB) > 20/08/21 08:38:31 INFO MapOutputTrackerMasterEndpoint: Asked to send map > output locations for shuffle 32 to 10.160.39.149:43212 > 20/08/21 08:38:31 WARN TaskSetManager: Lost task 6.0 in stage 151.0 (TID > 31215, sl73caehdn0423.visa.com, executor 48): > org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType > UPDATE for partition :6 > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) > at > > org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) > at > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > at > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > at > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) > at > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) > at > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) > at > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Null-value for required field: rule_profile_id_list > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257) > ... 28 more > Caused by: org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Null-value for required field: rule_profile_id_list > at > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200) > ... 30 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Null-value for required field: > rule_profile_id_list > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140) > ... 31 more > Caused by: java.lang.RuntimeException: Null-value for required field: > rule_profile_id_list > at > > org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170) > at > org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142) > at > > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123) > at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293) > at > > org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101) > at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422) > at > > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38) > at > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more >