[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation
[ https://issues.apache.org/jira/browse/IMPALA-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10377: Fix Version/s: Impala 4.0.0 > Improve the accuracy of resource estimation > --- > > Key: IMPALA-10377 > URL: https://issues.apache.org/jira/browse/IMPALA-10377 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: liuyao >Assignee: liuyao >Priority: Major > Labels: estimate, memory, statistics > Fix For: Impala 4.0.0 > > Original Estimate: 120h > Remaining Estimate: 120h > > PlanNode does not consider some factors when estimating memory, this will > cause a large error rate > > AggregationNode > > 1.The memory occupied by hash table's own data structure is not considered. > Hash table inserts a new value, which will add a bucket. The size of a bucket > is 16 bytes. > 2.When estimating the NDV of merge aggregation, if there are multiple > grouping exprs, it may be divided by the number of Fragment Instances several > times, and it should be divided only once. > 3.When estimating the NDV of merge aggregation, and there are multiple > grouping exprs, the estimated memory is much smaller than the actual use. > 4.If there is no grouping exprs, the estimated memory is much larger than the > actual use. > 5.If the NDV of grouping exprs is very small, the estimated memory is much > larger than the actual use. > > SortNode > 1.Estimate the memory usage of external sort. the estimated memory is much > smaller than the actual use. > > > HashJoinNode > 1.The memory occupied by hash table's own data structure is not > considered.Hash Table will keep duplicate data, so the size of DuplicateNode > should be considered. > 2.Hash table will create multiple buckets in advance. The size of these > buckets should be considered. > > KuduScanNode > 1.Estimate memory by scanning all columns,the estimated memory is much larger > than the actual use. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation
[ https://issues.apache.org/jira/browse/IMPALA-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10377: --- Target Version: Impala 4.0 > Improve the accuracy of resource estimation > --- > > Key: IMPALA-10377 > URL: https://issues.apache.org/jira/browse/IMPALA-10377 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: liuyao >Assignee: liuyao >Priority: Major > Labels: estimate, memory, statistics > Original Estimate: 120h > Remaining Estimate: 120h > > PlanNode does not consider some factors when estimating memory, this will > cause a large error rate > > AggregationNode > > 1.The memory occupied by hash table's own data structure is not considered. > Hash table inserts a new value, which will add a bucket. The size of a bucket > is 16 bytes. > 2.When estimating the NDV of merge aggregation, if there are multiple > grouping exprs, it may be divided by the number of Fragment Instances several > times, and it should be divided only once. > 3.When estimating the NDV of merge aggregation, and there are multiple > grouping exprs, the estimated memory is much smaller than the actual use. > 4.If there is no grouping exprs, the estimated memory is much larger than the > actual use. > 5.If the NDV of grouping exprs is very small, the estimated memory is much > larger than the actual use. > > SortNode > 1.Estimate the memory usage of external sort. the estimated memory is much > smaller than the actual use. > > > HashJoinNode > 1.The memory occupied by hash table's own data structure is not > considered.Hash Table will keep duplicate data, so the size of DuplicateNode > should be considered. > 2.Hash table will create multiple buckets in advance. The size of these > buckets should be considered. > > KuduScanNode > 1.Estimate memory by scanning all columns,the estimated memory is much larger > than the actual use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation
[ https://issues.apache.org/jira/browse/IMPALA-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10377: --- Fix Version/s: (was: Impala 4.0) > Improve the accuracy of resource estimation > --- > > Key: IMPALA-10377 > URL: https://issues.apache.org/jira/browse/IMPALA-10377 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: liuyao >Assignee: liuyao >Priority: Major > Labels: estimate, memory, statistics > Original Estimate: 120h > Remaining Estimate: 120h > > PlanNode does not consider some factors when estimating memory, this will > cause a large error rate > > AggregationNode > > 1.The memory occupied by hash table's own data structure is not considered. > Hash table inserts a new value, which will add a bucket. The size of a bucket > is 16 bytes. > 2.When estimating the NDV of merge aggregation, if there are multiple > grouping exprs, it may be divided by the number of Fragment Instances several > times, and it should be divided only once. > 3.When estimating the NDV of merge aggregation, and there are multiple > grouping exprs, the estimated memory is much smaller than the actual use. > 4.If there is no grouping exprs, the estimated memory is much larger than the > actual use. > 5.If the NDV of grouping exprs is very small, the estimated memory is much > larger than the actual use. > > SortNode > 1.Estimate the memory usage of external sort. the estimated memory is much > smaller than the actual use. > > > HashJoinNode > 1.The memory occupied by hash table's own data structure is not > considered.Hash Table will keep duplicate data, so the size of DuplicateNode > should be considered. > 2.Hash table will create multiple buckets in advance. The size of these > buckets should be considered. > > KuduScanNode > 1.Estimate memory by scanning all columns,the estimated memory is much larger > than the actual use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org