Re: Contributing to Impala

Quanlong Huang Thu, 03 Dec 2020 23:46:03 -0800

Welcome! I've assigned the JIRA to you. Let me know if you encounter any
troubles.


Cheers,
Quanlong

On Fri, Dec 4, 2020 at 2:39 PM 刘垚 <54liu...@163.com> wrote:

> I want to contribute code to impala.
>
>
> IMPALA-10377 Improve the accuracy of resource estimation
>
>
> PlanNode does not consider some factors when estimating memory, this will
> cause a large error rate
>
>
> AggregationNode
> 1.The memory occupied by hash table's own data structure is not
> considered. Hash table inserts a new value, which will add a bucket. The
> size of a bucket is 16 bytes.
> 2.When estimating the NDV of merge aggregation, if there are multiple
> grouping exprs, it may be divided by the number of Fragment Instances
> several times, and it should be divided only once.
> 3.When estimating the NDV of merge aggregation, and there are multiple
> grouping exprs, the estimated memory is much smaller than the actual use.
> 4.If there is no grouping exprs, the estimated memory is much larger than
> the actual use.
> 5.If the NDV of grouping exprs is very small, the estimated memory is much
> larger than the actual use.
>
>
> SortNode
> 1.Estimate the memory usage of external sort. the estimated memory is much
> smaller than the actual use.
>
>
> HashJoinNode
> 1.The memory occupied by hash table's own data structure is not
> considered.Hash Table will keep duplicate data, so the size of
> DuplicateNode should be considered.
> 2.Hash table will create multiple buckets in advance. The size of these
> buckets should be considered.
>
>
> KuduScanNode
> 1.Estimate memory by scanning all columns,the estimated memory is much
> larger than the actual use.

Re: Contributing to Impala

Reply via email to