Siddharth Teotia created ARROW-1943:
---------------------------------------

             Summary: Handle setInitialCapacity() for deeply nested lists of 
lists
                 Key: ARROW-1943
                 URL: https://issues.apache.org/jira/browse/ARROW-1943
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Siddharth Teotia
            Assignee: Siddharth Teotia


The current implementation of setInitialCapacity() uses a factor of 5 for every 
level we go into list:

So if the schema is LIST (LIST (LIST (LIST (LIST (LIST (LIST (BIGINT)))))) and 
we start with an initial capacity of 128, we end up not throwing 
OversizedAllocationException from the BigIntVector because at every level we 
increased the capacity by 5 and by the time we reached inner scalar that 
actually stores the data, we were well over max size limit per vector (1MB).

We saw this problem in Dremio when we failed to read deeply nested JSON data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to