lewyh commented on issue #7145: URL: https://github.com/apache/hudi/issues/7145#issuecomment-1304789269
Thanks, I can confirm that adding a dummy field to the struct avoids this issue. However, two things to note: - The dummy field needs to exist when the table is created - if you try adding the dummy field via schema evolution then the same error persists, since it's caused by the reading of the existing data. So anyone creating a table with this kind of structure needs to be aware of this issue _before_ they create their table. - I need to set `config("spark.hadoop.parquet.avro.write-old-list-structure", False)` to resolve a separate issue of arrays containing NULL values. When this config is set, then the above fix does not work. Regardless of the presence of a dummy field, the error appears, only instead of `Can't redefine: array`, it reads `Can't redefine: list`. I believe this means that Hudi is unusable for users that need to support NULL values in arrays, and have Structs within Arrays within Structs. Perhaps this is worth a note on the Hudi docs? The comprehensive schema evolution documentation is what originally attracted us to Hudi, so a warning about these situations might help others avoid this pitfall. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org