emkornfield commented on pull request #7319: URL: https://github.com/apache/arrow/pull/7319#issuecomment-670968236
> @emkornfield not sure if I understand this part, I'll try create a nested batch with a few levels, and have one record have the top level be nested. There are two bugs in C++ (one with an open PR). The first bug is if you have a schema like `nullable struct<list<nullable struct<nullable struct<int>>>` you need to include all null values from the leaf to the list. The bug we had in C++ is we would only include the first level of nulls and drop the other ones (leading to inconsistent list size). The second bug we have with no PR. if you have a schema `nullable struct<nullable int>` then the null validity buffer could look like `[null, null, null]` but the underlying int vector could have valid values `[1, 2, 3]`. For the purposes of writing to parquet the values should all be considered null. The only way to determine this is to re-walk the tree or use the already generated levels to generate a new bitmap for the leaf.. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
