Re: Null-value for required field Error

2020-08-23 Thread Balaji Varadarajan
Can you open an issue and we will look into this there. To confirm the
theory, can you enable INFO logging and paste the output with the line:

"Registered avro schema : ..."

Can you also print the schema using inputDF.printSchema()

Thanks,
Balaji.V

On Fri, Aug 21, 2020 at 12:53 PM selvaraj periyasamy <
selvaraj.periyasamy1...@gmail.com> wrote:

> Thanks Balaji.
>
> could you please provide more info on how to get it done and pass it to
> hudi?
>
> Thanks,
> Selva
>
> On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan
>  wrote:
>
> >  Hi Selvaraj,
> > Even though the incoming batch has non null values for the new column,
> > existing data do not have this column. So, you need to make sure the avro
> > schema has the new column to be nullable and be backwards compatible.
> > Balaji.V
> > On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy <
> > selvaraj.periyasamy1...@gmail.com> wrote:
> >
> >  Hi,
> >
> > with 0.5.0 version of Hudi, I am using COW table type, which is
> > partitioned by mmdd format . We already have a table with
> Array
> > type columns and data populated. And then we are now trying to add a new
> > column ("rule_profile_id_list") in dataframes and while trying to write ,
> > getting below exception the below error message.  I am making sure that
> > DataFrame that I pass is having non null value as it is a non-nullable
> > column as per schema definition in dataframe.  I don't use "--conf
> > spark.sql.hive.convertMetastoreParquet=false" because I am already
> setting
> >  below code snippet handled in my code.
> >
> >
> >
> sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
> > classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
> > classOf[org.apache.hadoop.fs.PathFilter]);
> >
> >
> > Could someone help me to resolve this error?
> >
> > 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID
> > 31217, sl73caehdn0811.visa.com, executor 10):
> > org.apache.hudi.exception.HoodieUpsertException: Error upserting
> bucketType
> > UPDATE for partition :8
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
> > at
> >
> >
> org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
> > at
> >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
> > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
> > at
> >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
> > at
> >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
> > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
> > at
> >
> >
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
> > at
> >
> >
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
> > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > at org.apache.spark.scheduler.Task.run(Task.scala:109)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: org.apache.hudi.exception.HoodieException:
> > org.apache.hudi.exception.HoodieException:
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > Null-value for required field: rule_profile_id_list
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
> > at
> >
> >
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:17

Re: Null-value for required field Error

2020-08-21 Thread selvaraj periyasamy
Thanks Balaji.

could you please provide more info on how to get it done and pass it to
hudi?

Thanks,
Selva

On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan
 wrote:

>  Hi Selvaraj,
> Even though the incoming batch has non null values for the new column,
> existing data do not have this column. So, you need to make sure the avro
> schema has the new column to be nullable and be backwards compatible.
> Balaji.V
> On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy <
> selvaraj.periyasamy1...@gmail.com> wrote:
>
>  Hi,
>
> with 0.5.0 version of Hudi, I am using COW table type, which is
> partitioned by mmdd format . We already have a table with Array
> type columns and data populated. And then we are now trying to add a new
> column ("rule_profile_id_list") in dataframes and while trying to write ,
> getting below exception the below error message.  I am making sure that
> DataFrame that I pass is having non null value as it is a non-nullable
> column as per schema definition in dataframe.  I don't use "--conf
> spark.sql.hive.convertMetastoreParquet=false" because I am already setting
>  below code snippet handled in my code.
>
>
> sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
> classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
> classOf[org.apache.hadoop.fs.PathFilter]);
>
>
> Could someone help me to resolve this error?
>
> 20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID
> 31217, sl73caehdn0811.visa.com, executor 10):
> org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
> UPDATE for partition :8
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
> at
>
> org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
> at
>
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> at
>
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> at
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> at
>
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
> at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
> at
>
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
> at
>
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
> at
>
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
> at
>
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException:
> org.apache.hudi.exception.HoodieException:
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Null-value for required field: rule_profile_id_list
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
> ... 28 more
> Caused by: org.apache.hudi.exception.HoodieException:
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Null-value for required field: rule_profile_id_list
> at
>
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
> at
>
> org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
> ... 30 more
> Caused by: java.util.concurrent.ExecutionException:
> java.la

Re: Null-value for required field Error

2020-08-21 Thread Balaji Varadarajan
 Hi Selvaraj,
Even though the incoming batch has non null values for the new column, existing 
data do not have this column. So, you need to make sure the avro schema has the 
new column to be nullable and be backwards compatible.
Balaji.V
On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy 
 wrote:  
 
 Hi,

with 0.5.0 version of Hudi, I am using COW table type, which is
partitioned by mmdd format . We already have a table with Array
type columns and data populated. And then we are now trying to add a new
column ("rule_profile_id_list") in dataframes and while trying to write ,
getting below exception the below error message.  I am making sure that
DataFrame that I pass is having non null value as it is a non-nullable
column as per schema definition in dataframe.  I don't use "--conf
spark.sql.hive.convertMetastoreParquet=false" because I am already setting
 below code snippet handled in my code.

sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
classOf[org.apache.hadoop.fs.PathFilter]);


Could someone help me to resolve this error?

20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID
31217, sl73caehdn0811.visa.com, executor 10):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :8
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
... 30 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
... 31 more
Caused by: java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at
org.apache.p