[ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444985#comment-16444985
 ] 

ASF GitHub Bot commented on DRILL-6307:
---------------------------------------

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/1228

    DRILL-6307: Handle empty batches in record batch sizer correctly

    When we get empty batch, record batch sizer calculates row width as zero. 
In that case, we do not do accounting and memory allocation correctly for 
outgoing batches. 
    
    For ex., for outer left join, if right side batch is empty, we still have 
to include the right side columns as null in outgoing batch. Say first batch is 
empty. Then, for outgoing, we allocate empty vectors with zero capacity.  When 
we read the next batch with data, we will end up going through realloc loop as 
we write values. Also, if we use right side row width as 0 in outgoing row 
width calculation, number of rows (to include in the outgoing batch) we will 
calculate will be higher and later when we get a non empty batch, we might 
exceed the memory limits. 
    
    This PR tries to address these problems by allocating memory based on std 
size for empty input batch. Uses allocation width as width of the batch in 
number of rows calculation for binary operators. For unary operators, this is 
not a problem since we drop empty batches without doing any processing. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-6307

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1228
    
----
commit cd78209e9f75a59edc68df3e416f3936fb00f917
Author: Padma Penumarthy <ppenumar97@...>
Date:   2018-04-06T19:56:06Z

    DRILL-6307: Handle empty batches in record batch sizer correctly

----


> Handle empty batches in record batch sizer correctly
> ----------------------------------------------------
>
>                 Key: DRILL-6307
>                 URL: https://issues.apache.org/jira/browse/DRILL-6307
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.13.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to