[ 
https://issues.apache.org/jira/browse/ARROW-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185332#comment-17185332
 ] 

Antoine Pitrou commented on ARROW-8494:
---------------------------------------

If I understand correctly, for a non-list nullable field, we only need to 
update the null bitmap:
* if def level >= field's def level, append non-null
* otherwise, append null

For a non-nullable list field, we must update the offsets:
* if rep level < field's rep level and def level < field's def level , append 
current_offset (empty list)
* if rep level < field's rep level and def level >= field's def level , append 
current_offset++ (first item in new list)
* otherwise, just current_offset++ (next item in same list)

For a nullable list field, the ancestor_def_level must also be taken into 
account?

So non-list fields are easy, list fields have more sophisticated logic that 
might be less easy to do efficiently.

> [C++] Implement vectorized array reassembly logic
> -------------------------------------------------
>
>                 Key: ARROW-8494
>                 URL: https://issues.apache.org/jira/browse/ARROW-8494
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++
>            Reporter: Micah Kornfield
>            Assignee: Micah Kornfield
>            Priority: Major
>
> This logic would attempt to create the data necessary for each field by 
> passing through the levels once for each field.  it is expected that due to 
> SIMD this will perform better for nested data with shallow nesting, but due 
> to repetitive computation might perform worse for deep nested that include 
> List-types.
>  
> At a high level the logic would be structured as:
> {{for each field:}}
> {{   for each rep/def level entry:}}
> {{           update null bitmask and offsets.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to