[ 
https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6071:
---------------------------------
    Reviewer: Paul Rogers

> Limit batch size for flatten operator
> -------------------------------------
>
>                 Key: DRILL-6071
>                 URL: https://issues.apache.org/jira/browse/DRILL-6071
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.13.0
>
>
> flatten currently uses an adaptive algorithm to control the outgoing batch 
> size. 
>  While processing the input batch, it adjusts the number of records in 
> outgoing batch based on memory usage so far. Once memory usage exceeds the 
> configured limit for a batch, the algorithm becomes more proactive and 
> adjusts the limit half way through and end of every batch. All this periodic 
> checking of memory usage is unnecessary overhead and impacts performance. 
> Also, we will know only after the fact.
> Instead, figure out how many rows should be there in the outgoing batch from 
> incoming batch.
>  The way to do that would be to figure out average row size of the outgoing 
> batch and based on that figure out how many rows can be there for a given 
> amount of memory. value vectors provide us the necessary information to be 
> able to figure this out.
> Row count in output batch should be decided based on memory (with min 1 and 
> max 64k rows) and not hard coded (to 4K) in code. Memory for output batch 
> should be configurable system option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to