Hi,

with 0.5.0 version of Hudi, I am using COW table type, which is
partitioned by yyyymmdd format . We already have a table with Array<String>
type columns and data populated. And then we are now trying to add a new
column ("rule_profile_id_list") in dataframes and while trying to write ,
getting below exception the below error message.   I am making sure that
DataFrame that I pass is having non null value as it is a non-nullable
column as per schema definition in dataframe.  I don't use "--conf
spark.sql.hive.convertMetastoreParquet=false" because I am already setting
 below code snippet handled in my code.

sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
classOf[org.apache.hadoop.fs.PathFilter]);


Could someone help me to resolve this error?

20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID
31217, sl73caehdn0811.visa.com, executor 10):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :8
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
... 30 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
... 31 more
Caused by: java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more

20/08/21 08:38:30 INFO TaskSetManager: Starting task 8.1 in stage 151.0
(TID 31238, sl73caehdn0615.visa.com, executor 100, partition 8,
PROCESS_LOCAL, 7661 bytes)
20/08/21 08:38:30 INFO TaskSetManager: Finished task 14.0 in stage 151.0
(TID 31223) in 1269 ms on sl73caehdn0615.visa.com (executor 100) (1/29)
20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_9 in memory on
sl73caehdn0709.visa.com:34428 (size: 379.0 B, free: 5.1 GB)
20/08/21 08:38:30 INFO TaskSetManager: Finished task 9.0 in stage 151.0
(TID 31218) in 1663 ms on sl73caehdn0709.visa.com (executor 62) (2/29)
20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_23 in memory on
sl73caehdn0716.visa.com:45986 (size: 372.0 B, free: 5.1 GB)
20/08/21 08:38:30 INFO TaskSetManager: Finished task 23.0 in stage 151.0
(TID 31232) in 1754 ms on sl73caehdn0716.visa.com (executor 99) (3/29)
20/08/21 08:38:31 INFO TaskSetManager: Lost task 8.1 in stage 151.0 (TID
31238) on sl73caehdn0615.visa.com, executor 100:
org.apache.hudi.exception.HoodieUpsertException (Error upserting bucketType
UPDATE for partition :8) [duplicate 1]
20/08/21 08:38:31 INFO TaskSetManager: Starting task 8.2 in stage 151.0
(TID 31239, sl73caehdn0711.visa.com, executor 81, partition 8,
PROCESS_LOCAL, 7661 bytes)
20/08/21 08:38:31 INFO BlockManagerInfo: Added broadcast_130_piece0 in
memory on sl73caehdn0711.visa.com:43376 (size: 89.1 KB, free: 5.1 GB)
20/08/21 08:38:31 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 32 to 10.160.39.149:43212
20/08/21 08:38:31 WARN TaskSetManager: Lost task 6.0 in stage 151.0 (TID
31215, sl73caehdn0423.visa.com, executor 48):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :6
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
... 30 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
... 31 more
Caused by: java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more

Reply via email to