Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/781#discussion_r106792968 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -502,11 +502,15 @@ public void setSafe(int index, byte[] bytes) { assert index >= 0; final int currentOffset = offsetVector.getAccessor().get(index); - while (data.capacity() < currentOffset + bytes.length) { - reAlloc(); - } offsetVector.getMutator().setSafe(index + 1, currentOffset + bytes.length); - data.setBytes(currentOffset, bytes, 0, bytes.length); + try { + data.setBytes(currentOffset, bytes, 0, bytes.length); + } catch (IndexOutOfBoundsException e) { --- End diff -- The check for a DrillBuf exceeding bounds is already being done once in Netty (which also throws the Exception). What we want to do is to avoid having to do more than one bounds check for every vector write operation. By adding the suggested check, we simply move the check from every vector setSafe method to every Drillbuf write. This would possibly impact performance in other parts of the code. I particularly changed only var len vectors to address a hotspot in the Parquet reader performance, because as you are suggesting, the right way to fix is to address the low density batch problem at the same time as the vector overflow problem. Perhaps a longer discussion may be required here.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---