Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/781#discussion_r106792968
  
    --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
    @@ -502,11 +502,15 @@ public void setSafe(int index, byte[] bytes) {
           assert index >= 0;
     
           final int currentOffset = offsetVector.getAccessor().get(index);
    -      while (data.capacity() < currentOffset + bytes.length) {
    -        reAlloc();
    -      }
           offsetVector.getMutator().setSafe(index + 1, currentOffset + 
bytes.length);
    -      data.setBytes(currentOffset, bytes, 0, bytes.length);
    +      try {
    +        data.setBytes(currentOffset, bytes, 0, bytes.length);
    +      } catch (IndexOutOfBoundsException e) {
    --- End diff --
    
    The check for a DrillBuf exceeding bounds is already being done once in 
Netty (which also throws the Exception). What we want to do  is to avoid having 
to do more than one bounds check for every vector write operation. By adding 
the suggested check, we simply move the check from every vector setSafe method 
to every Drillbuf write. This would possibly impact performance in other parts 
of the code. 
    I particularly changed only var len vectors to address a hotspot in the 
Parquet reader performance, because as you are suggesting, the right way to fix 
is to address the low density batch problem at the same time as the vector 
overflow problem. Perhaps a longer discussion may be required here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to