[ 
https://issues.apache.org/jira/browse/ARROW-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185320#comment-17185320
 ] 

Antoine Pitrou commented on ARROW-8495:
---------------------------------------

Are you expecting to switch between both approaches (vectorized / 
non-vectorized) depending on heuristics?

It seems to me that in most/all cases, the vectorized approach should be 
faster, perhaps by operating on limited-size chunks, such that we make better 
use of the CPU cache:
{code:java}
for each cache-sized chunk (e.g. 1K levels):
  for each field:
    for each rep/dep level entry in chunk:
      update null bitmask and offsets
{code}

Also, I assume this is for a single Parquet leaf node, right?

> [C++] Implement non-vectorized array reconstruction logic.
> ----------------------------------------------------------
>
>                 Key: ARROW-8495
>                 URL: https://issues.apache.org/jira/browse/ARROW-8495
>             Project: Apache Arrow
>          Issue Type: Sub-task
>            Reporter: Micah Kornfield
>            Priority: Major
>
> In contrast to the "Vectorized" reassembly this would scan:
>  
> {{for each rep/def level entry:}}
> {{     for each field:}}
> {{           update null bitmask and offsets.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to