[ https://issues.apache.org/jira/browse/ARROW-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185332#comment-17185332 ]
Antoine Pitrou commented on ARROW-8494: --------------------------------------- If I understand correctly, for a non-list nullable field, we only need to update the null bitmap: * if def level >= field's def level, append non-null * otherwise, append null For a non-nullable list field, we must update the offsets: * if rep level < field's rep level and def level < field's def level , append current_offset (empty list) * if rep level < field's rep level and def level >= field's def level , append current_offset++ (first item in new list) * otherwise, just current_offset++ (next item in same list) For a nullable list field, the ancestor_def_level must also be taken into account? So non-list fields are easy, list fields have more sophisticated logic that might be less easy to do efficiently. > [C++] Implement vectorized array reassembly logic > ------------------------------------------------- > > Key: ARROW-8494 > URL: https://issues.apache.org/jira/browse/ARROW-8494 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ > Reporter: Micah Kornfield > Assignee: Micah Kornfield > Priority: Major > > This logic would attempt to create the data necessary for each field by > passing through the levels once for each field. it is expected that due to > SIMD this will perform better for nested data with shallow nesting, but due > to repetitive computation might perform worse for deep nested that include > List-types. > > At a high level the logic would be structured as: > {{for each field:}} > {{ for each rep/def level entry:}} > {{ update null bitmask and offsets.}} -- This message was sent by Atlassian Jira (v8.3.4#803005)