[ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453245#comment-16453245 ]
ASF GitHub Bot commented on DRILL-6307: --------------------------------------- Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184236170 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -277,18 +286,29 @@ public boolean isRepeatedList() { /** * This is the average per entry width, used for vector allocation. */ - public int getEntryWidth() { + private int getEntryWidthForAlloc() { int width = 0; if (isVariableWidth) { - width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH; + width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH; // Subtract out the bits (is-set) vector width - if (metadata.getDataMode() == DataMode.OPTIONAL) { + if (isOptional) { width -= BIT_VECTOR_WIDTH; } + + if (isRepeated && getValueCount() == 0) { + return (safeDivide(width, STD_REPETITION_FACTOR)); --- End diff -- You are right. row count can be non-zero and valueCount can still be zero. Since this is intended for empty batch case, changed this to check for row count instead. > Handle empty batches in record batch sizer correctly > ---------------------------------------------------- > > Key: DRILL-6307 > URL: https://issues.apache.org/jira/browse/DRILL-6307 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.13.0 > Reporter: Padma Penumarthy > Assignee: Padma Penumarthy > Priority: Major > Fix For: 1.14.0 > > > when we get empty batch, record batch sizer calculates row width as zero. In > that case, we do not do accounting and memory allocation correctly for > outgoing batches. > For example, in merge join, for outer left join, if right side batch is > empty, we still have to include the right side columns as null in outgoing > batch. > Say first batch is empty. Then, for outgoing, we allocate empty vectors with > zero capacity. When we read the next batch with data, we will end up going > through realloc loop. If we use right side row width as 0 in outgoing row > width calculation, number of rows we will calculate will be higher and later > when we get a non empty batch, we might exceed the memory limits. > One possible workaround/solution : Allocate memory based on std size for > empty input batch. Use allocation width as width of the batch in number of > rows calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)