[ https://issues.apache.org/jira/browse/HUDI-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-7874: --------------------------------- Labels: pull-request-available (was: ) > Fail to read 2-level structure Parquet > -------------------------------------- > > Key: HUDI-7874 > URL: https://issues.apache.org/jira/browse/HUDI-7874 > Project: Apache Hudi > Issue Type: Bug > Reporter: Vitali Makarevich > Priority: Major > Labels: pull-request-available > > If I have {{"spark.hadoop.parquet.avro.write-old-list-structure", "false"}} > explicitly set - to being able to write nulls inside arrays(the only way), > Hudi starts to write Parquets with the following schema inside: > {{ required group internal_list (LIST) \{ > repeated group list { > required int64 element; > } > }}} > > But if I had some files produced before setting > {{{}"spark.hadoop.parquet.avro.write-old-list-structure", "false"{}}}, they > have the following schema inside > {{ required group internal_list (LIST) \{ > repeated int64 array; > }}} > > And Hudi 0.14.x at least fails to read records from such file - failing with > exception > {{Caused by: java.lang.RuntimeException: Null-value for required field: }} > Even though the contents of arrays is {{{}not null{}}}(it cannot be null in > fact since Avro requires > {{spark.hadoop.parquet.avro.write-old-list-structure}} = {{false}} to write > {{{}null{}}}s. > h3. Expected behavior > Taken from Hudi 0.12.1(not sure what exactly broke that): > # If I have a file with 2 level structure and update(not matter having nulls > inside array or not - both produce the same) arrives with > "spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it > into 3 level.({*}fails in 0.14.1{*}) > # If I have 3 level structure with nulls and update cames(not matter with > nulls or without) - read and write correctly > The simple reproduction of issue can be found here: > [https://github.com/VitoMakarevich/hudi-issue-014] > Highly likely, the problem appeared after Hudi made some changes, so values > from Hadoop conf started to propagate into Reader instance(likely they were > not propagated before). -- This message was sent by Atlassian Jira (v8.20.10#820010)