Thanks Balaji.

could you please provide more info on how to get it done and pass it to
hudi?

Thanks,
Selva

On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan
<v.bal...@ymail.com.invalid> wrote:

>  Hi Selvaraj,
> Even though the incoming batch has non null values for the new column,
> existing data do not have this column. So, you need to make sure the avro
> schema has the new column to be nullable and be backwards compatible.
> Balaji.V
>     On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy <
> selvaraj.periyasamy1...@gmail.com> wrote:
>
>  Hi,
>
> with 0.5.0 version of Hudi, I am using COW table type, which is
> partitioned by yyyymmdd format . We already have a table with Array<String>
> type columns and data populated. And then we are now trying to add a new
> column ("rule_profile_id_list") in dataframes and while trying to write ,
> getting below exception the below error message.  I am making sure that
> DataFrame that I pass is having non null value as it is a non-nullable
> column as per schema definition in dataframe.  I don't use "--conf
> spark.sql.hive.convertMetastoreParquet=false" because I am already setting
>  below code snippet handled in my code.
>
>
> sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
> classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
> classOf[org.apache.hadoop.fs.PathFilter]);
>
>
> Could someone help me to resolve this error?
>
> 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID
> 31217, sl73caehdn0811.visa.com, executor 10):
> org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
> UPDATE for partition :8
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
> at
>
> org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
> at
>
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> at
>
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> at
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> at
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
> at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
> at
>
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
> at
>
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
> at
>
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
> at
>
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException:
> org.apache.hudi.exception.HoodieException:
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Null-value for required field: rule_profile_id_list
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
> ... 28 more
> Caused by: org.apache.hudi.exception.HoodieException:
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Null-value for required field: rule_profile_id_list
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
> ... 30 more
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: Null-value for required field:
> rule_profile_id_list
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
> ... 31 more
> Caused by: java.lang.RuntimeException: Null-value for required field:
> rule_profile_id_list
> at
>
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
> at
> org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
> at
>
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
> at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
> at
>
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
> at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
>
> 20/08/21 08:38:30 INFO TaskSetManager: Starting task 8.1 in stage 151.0
> (TID 31238, sl73caehdn0615.visa.com, executor 100, partition 8,
> PROCESS_LOCAL, 7661 bytes)
> 20/08/21 08:38:30 INFO TaskSetManager: Finished task 14.0 in stage 151.0
> (TID 31223) in 1269 ms on sl73caehdn0615.visa.com (executor 100) (1/29)
> 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_9 in memory on
> sl73caehdn0709.visa.com:34428 (size: 379.0 B, free: 5.1 GB)
> 20/08/21 08:38:30 INFO TaskSetManager: Finished task 9.0 in stage 151.0
> (TID 31218) in 1663 ms on sl73caehdn0709.visa.com (executor 62) (2/29)
> 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_23 in memory on
> sl73caehdn0716.visa.com:45986 (size: 372.0 B, free: 5.1 GB)
> 20/08/21 08:38:30 INFO TaskSetManager: Finished task 23.0 in stage 151.0
> (TID 31232) in 1754 ms on sl73caehdn0716.visa.com (executor 99) (3/29)
> 20/08/21 08:38:31 INFO TaskSetManager: Lost task 8.1 in stage 151.0 (TID
> 31238) on sl73caehdn0615.visa.com, executor 100:
> org.apache.hudi.exception.HoodieUpsertException (Error upserting bucketType
> UPDATE for partition :8) [duplicate 1]
> 20/08/21 08:38:31 INFO TaskSetManager: Starting task 8.2 in stage 151.0
> (TID 31239, sl73caehdn0711.visa.com, executor 81, partition 8,
> PROCESS_LOCAL, 7661 bytes)
> 20/08/21 08:38:31 INFO BlockManagerInfo: Added broadcast_130_piece0 in
> memory on sl73caehdn0711.visa.com:43376 (size: 89.1 KB, free: 5.1 GB)
> 20/08/21 08:38:31 INFO MapOutputTrackerMasterEndpoint: Asked to send map
> output locations for shuffle 32 to 10.160.39.149:43212
> 20/08/21 08:38:31 WARN TaskSetManager: Lost task 6.0 in stage 151.0 (TID
> 31215, sl73caehdn0423.visa.com, executor 48):
> org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
> UPDATE for partition :6
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
> at
>
> org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
> at
>
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> at
>
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> at
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> at
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
> at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
> at
>
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
> at
>
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
> at
>
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
> at
>
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException:
> org.apache.hudi.exception.HoodieException:
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Null-value for required field: rule_profile_id_list
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
> ... 28 more
> Caused by: org.apache.hudi.exception.HoodieException:
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Null-value for required field: rule_profile_id_list
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
> ... 30 more
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: Null-value for required field:
> rule_profile_id_list
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
> ... 31 more
> Caused by: java.lang.RuntimeException: Null-value for required field:
> rule_profile_id_list
> at
>
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
> at
> org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
> at
>
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
> at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
> at
>
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
> at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
>

Reply via email to