emkornfield commented on pull request #7319:
URL: https://github.com/apache/arrow/pull/7319#issuecomment-670968236


   > @emkornfield not sure if I understand this part, I'll try create a nested 
batch with a few levels, and have one record have the top level be nested.
   
   There are two bugs in C++ (one with an open PR).  
   
   The first bug is if you have a schema like `nullable struct<list<nullable 
struct<nullable struct<int>>>` you need to include all null values from the 
leaf to the list.  The bug we had in C++ is we would only include the first 
level of nulls and drop the other ones (leading to inconsistent list size).
   
   The second bug we have with no PR.  if you have a schema `nullable 
struct<nullable int>` then the null validity buffer could look like `[null, 
null, null]` but the underlying int vector could have valid values `[1, 2, 3]`. 
 For the purposes of writing to parquet the values should all be considered 
null.  The only way to determine this is to re-walk the tree or use the already 
generated levels to generate a new bitmap for the leaf..
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to