Can you open an issue and we will look into this there. To confirm the
theory, can you enable INFO logging and paste the output with the line:

"Registered avro schema : ..."

Can you also print the schema using inputDF.printSchema()

Thanks,
Balaji.V

On Fri, Aug 21, 2020 at 12:53 PM selvaraj periyasamy <
[email protected]> wrote:

> Thanks Balaji.
>
> could you please provide more info on how to get it done and pass it to
> hudi?
>
> Thanks,
> Selva
>
> On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan
> <[email protected]> wrote:
>
> >  Hi Selvaraj,
> > Even though the incoming batch has non null values for the new column,
> > existing data do not have this column. So, you need to make sure the avro
> > schema has the new column to be nullable and be backwards compatible.
> > Balaji.V
> >     On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy <
> > [email protected]> wrote:
> >
> >  Hi,
> >
> > with 0.5.0 version of Hudi, I am using COW table type, which is
> > partitioned by yyyymmdd format . We already have a table with
> Array<String>
> > type columns and data populated. And then we are now trying to add a new
> > column ("rule_profile_id_list") in dataframes and while trying to write ,
> > getting below exception the below error message.  I am making sure that
> > DataFrame that I pass is having non null value as it is a non-nullable
> > column as per schema definition in dataframe.  I don't use "--conf
> > spark.sql.hive.convertMetastoreParquet=false" because I am already
> setting
> >  below code snippet handled in my code.
> >
> >
> >
> sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
> > classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
> > classOf[org.apache.hadoop.fs.PathFilter]);
> >
> >
> > Could someone help me to resolve this error?
> >
> > 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID
> > 31217, sl73caehdn0811.visa.com, executor 10):
> > org.apache.hudi.exception.HoodieUpsertException: Error upserting
> bucketType
> > UPDATE for partition :8
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
> > at
> >
> >
> org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
> > at
> >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
> > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
> > at
> >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
> > at
> >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
> > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
> > at
> >
> >
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
> > at
> >
> >
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
> > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > at org.apache.spark.scheduler.Task.run(Task.scala:109)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: org.apache.hudi.exception.HoodieException:
> > org.apache.hudi.exception.HoodieException:
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > Null-value for required field: rule_profile_id_list
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
> > ... 28 more
> > Caused by: org.apache.hudi.exception.HoodieException:
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > Null-value for required field: rule_profile_id_list
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
> > ... 30 more
> > Caused by: java.util.concurrent.ExecutionException:
> > java.lang.RuntimeException: Null-value for required field:
> > rule_profile_id_list
> > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
> > ... 31 more
> > Caused by: java.lang.RuntimeException: Null-value for required field:
> > rule_profile_id_list
> > at
> >
> >
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
> > at
> > org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
> > at
> >
> >
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
> > at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
> > at
> >
> >
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
> > at org.apache.hudi.io
> .HoodieMergeHandle.write(HoodieMergeHandle.java:288)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > ... 3 more
> >
> > 20/08/21 08:38:30 INFO TaskSetManager: Starting task 8.1 in stage 151.0
> > (TID 31238, sl73caehdn0615.visa.com, executor 100, partition 8,
> > PROCESS_LOCAL, 7661 bytes)
> > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 14.0 in stage 151.0
> > (TID 31223) in 1269 ms on sl73caehdn0615.visa.com (executor 100) (1/29)
> > 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_9 in memory on
> > sl73caehdn0709.visa.com:34428 (size: 379.0 B, free: 5.1 GB)
> > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 9.0 in stage 151.0
> > (TID 31218) in 1663 ms on sl73caehdn0709.visa.com (executor 62) (2/29)
> > 20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_23 in memory on
> > sl73caehdn0716.visa.com:45986 (size: 372.0 B, free: 5.1 GB)
> > 20/08/21 08:38:30 INFO TaskSetManager: Finished task 23.0 in stage 151.0
> > (TID 31232) in 1754 ms on sl73caehdn0716.visa.com (executor 99) (3/29)
> > 20/08/21 08:38:31 INFO TaskSetManager: Lost task 8.1 in stage 151.0 (TID
> > 31238) on sl73caehdn0615.visa.com, executor 100:
> > org.apache.hudi.exception.HoodieUpsertException (Error upserting
> bucketType
> > UPDATE for partition :8) [duplicate 1]
> > 20/08/21 08:38:31 INFO TaskSetManager: Starting task 8.2 in stage 151.0
> > (TID 31239, sl73caehdn0711.visa.com, executor 81, partition 8,
> > PROCESS_LOCAL, 7661 bytes)
> > 20/08/21 08:38:31 INFO BlockManagerInfo: Added broadcast_130_piece0 in
> > memory on sl73caehdn0711.visa.com:43376 (size: 89.1 KB, free: 5.1 GB)
> > 20/08/21 08:38:31 INFO MapOutputTrackerMasterEndpoint: Asked to send map
> > output locations for shuffle 32 to 10.160.39.149:43212
> > 20/08/21 08:38:31 WARN TaskSetManager: Lost task 6.0 in stage 151.0 (TID
> > 31215, sl73caehdn0423.visa.com, executor 48):
> > org.apache.hudi.exception.HoodieUpsertException: Error upserting
> bucketType
> > UPDATE for partition :6
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
> > at
> >
> >
> org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
> > at
> >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
> > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
> > at
> >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
> > at
> >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
> > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
> > at
> >
> >
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
> > at
> >
> >
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
> > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > at org.apache.spark.scheduler.Task.run(Task.scala:109)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: org.apache.hudi.exception.HoodieException:
> > org.apache.hudi.exception.HoodieException:
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > Null-value for required field: rule_profile_id_list
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
> > ... 28 more
> > Caused by: org.apache.hudi.exception.HoodieException:
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > Null-value for required field: rule_profile_id_list
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
> > ... 30 more
> > Caused by: java.util.concurrent.ExecutionException:
> > java.lang.RuntimeException: Null-value for required field:
> > rule_profile_id_list
> > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
> > ... 31 more
> > Caused by: java.lang.RuntimeException: Null-value for required field:
> > rule_profile_id_list
> > at
> >
> >
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
> > at
> > org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
> > at
> >
> >
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
> > at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
> > at
> >
> >
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
> > at org.apache.hudi.io
> .HoodieMergeHandle.write(HoodieMergeHandle.java:288)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
> > at
> >
> >
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > ... 3 more
> >
>

Reply via email to