Welcome! I've assigned the JIRA to you. Let me know if you encounter any troubles.
Cheers, Quanlong On Fri, Dec 4, 2020 at 2:39 PM 刘垚 <54liu...@163.com> wrote: > I want to contribute code to impala. > > > IMPALA-10377 Improve the accuracy of resource estimation > > > PlanNode does not consider some factors when estimating memory, this will > cause a large error rate > > > AggregationNode > 1.The memory occupied by hash table's own data structure is not > considered. Hash table inserts a new value, which will add a bucket. The > size of a bucket is 16 bytes. > 2.When estimating the NDV of merge aggregation, if there are multiple > grouping exprs, it may be divided by the number of Fragment Instances > several times, and it should be divided only once. > 3.When estimating the NDV of merge aggregation, and there are multiple > grouping exprs, the estimated memory is much smaller than the actual use. > 4.If there is no grouping exprs, the estimated memory is much larger than > the actual use. > 5.If the NDV of grouping exprs is very small, the estimated memory is much > larger than the actual use. > > > SortNode > 1.Estimate the memory usage of external sort. the estimated memory is much > smaller than the actual use. > > > HashJoinNode > 1.The memory occupied by hash table's own data structure is not > considered.Hash Table will keep duplicate data, so the size of > DuplicateNode should be considered. > 2.Hash table will create multiple buckets in advance. The size of these > buckets should be considered. > > > KuduScanNode > 1.Estimate memory by scanning all columns,the estimated memory is much > larger than the actual use.