Re: Null-value for required field Error
Can you open an issue and we will look into this there. To confirm the theory, can you enable INFO logging and paste the output with the line: "Registered avro schema : ..." Can you also print the schema using inputDF.printSchema() Thanks, Balaji.V On Fri, Aug 21, 2020 at 12:53 PM selvaraj periyasamy < selvaraj.periyasamy1...@gmail.com> wrote: > Thanks Balaji. > > could you please provide more info on how to get it done and pass it to > hudi? > > Thanks, > Selva > > On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan > wrote: > > > Hi Selvaraj, > > Even though the incoming batch has non null values for the new column, > > existing data do not have this column. So, you need to make sure the avro > > schema has the new column to be nullable and be backwards compatible. > > Balaji.V > > On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy < > > selvaraj.periyasamy1...@gmail.com> wrote: > > > > Hi, > > > > with 0.5.0 version of Hudi, I am using COW table type, which is > > partitioned by mmdd format . We already have a table with > Array > > type columns and data populated. And then we are now trying to add a new > > column ("rule_profile_id_list") in dataframes and while trying to write , > > getting below exception the below error message. I am making sure that > > DataFrame that I pass is having non null value as it is a non-nullable > > column as per schema definition in dataframe. I don't use "--conf > > spark.sql.hive.convertMetastoreParquet=false" because I am already > setting > > below code snippet handled in my code. > > > > > > > sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", > > classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], > > classOf[org.apache.hadoop.fs.PathFilter]); > > > > > > Could someone help me to resolve this error? > > > > 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID > > 31217, sl73caehdn0811.visa.com, executor 10): > > org.apache.hudi.exception.HoodieUpsertException: Error upserting > bucketType > > UPDATE for partition :8 > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) > > at > > > > > org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) > > at > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > at > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > at > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > > at > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > > at > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) > > at > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) > > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) > > at > > > > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) > > at > > > > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) > > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > > at org.apache.spark.scheduler.Task.run(Task.scala:109) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: org.apache.hudi.exception.HoodieException: > > org.apache.hudi.exception.HoodieException: > > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > > Null-value for required field: rule_profile_id_list > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) > > at > > > > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:17
Re: Null-value for required field Error
Thanks Balaji. could you please provide more info on how to get it done and pass it to hudi? Thanks, Selva On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan wrote: > Hi Selvaraj, > Even though the incoming batch has non null values for the new column, > existing data do not have this column. So, you need to make sure the avro > schema has the new column to be nullable and be backwards compatible. > Balaji.V > On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy < > selvaraj.periyasamy1...@gmail.com> wrote: > > Hi, > > with 0.5.0 version of Hudi, I am using COW table type, which is > partitioned by mmdd format . We already have a table with Array > type columns and data populated. And then we are now trying to add a new > column ("rule_profile_id_list") in dataframes and while trying to write , > getting below exception the below error message. I am making sure that > DataFrame that I pass is having non null value as it is a non-nullable > column as per schema definition in dataframe. I don't use "--conf > spark.sql.hive.convertMetastoreParquet=false" because I am already setting > below code snippet handled in my code. > > > sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", > classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], > classOf[org.apache.hadoop.fs.PathFilter]); > > > Could someone help me to resolve this error? > > 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID > 31217, sl73caehdn0811.visa.com, executor 10): > org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType > UPDATE for partition :8 > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) > at > > org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) > at > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > at > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > at > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) > at > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) > at > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) > at > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Null-value for required field: rule_profile_id_list > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257) > ... 28 more > Caused by: org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Null-value for required field: rule_profile_id_list > at > > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142) > at > > org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200) > ... 30 more > Caused by: java.util.concurrent.ExecutionException: > java.la
Re: Null-value for required field Error
Hi Selvaraj, Even though the incoming batch has non null values for the new column, existing data do not have this column. So, you need to make sure the avro schema has the new column to be nullable and be backwards compatible. Balaji.V On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy wrote: Hi, with 0.5.0 version of Hudi, I am using COW table type, which is partitioned by mmdd format . We already have a table with Array type columns and data populated. And then we are now trying to add a new column ("rule_profile_id_list") in dataframes and while trying to write , getting below exception the below error message. I am making sure that DataFrame that I pass is having non null value as it is a non-nullable column as per schema definition in dataframe. I don't use "--conf spark.sql.hive.convertMetastoreParquet=false" because I am already setting below code snippet handled in my code. sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], classOf[org.apache.hadoop.fs.PathFilter]); Could someone help me to resolve this error? 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID 31217, sl73caehdn0811.visa.com, executor 10): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :8 at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264) at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428) at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Null-value for required field: rule_profile_id_list at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Null-value for required field: rule_profile_id_list at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200) ... 30 more Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Null-value for required field: rule_profile_id_list at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140) ... 31 more Caused by: java.lang.RuntimeException: Null-value for required field: rule_profile_id_list at org.apache.p