subject:"\[jira\] \[Updated\] \(IMPALA\-10377\) Improve the accuracy of resource estimation"

[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation

2022-04-17 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10377:

Fix Version/s: Impala 4.0.0

> Improve the accuracy of resource estimation
> ---
>
> Key: IMPALA-10377
> URL: https://issues.apache.org/jira/browse/IMPALA-10377
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: liuyao
>Assignee: liuyao
>Priority: Major
>  Labels: estimate, memory, statistics
> Fix For: Impala 4.0.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> PlanNode does not consider some factors when estimating memory, this will 
> cause a large error rate
>  
> AggregationNode
>  
> 1.The memory occupied by hash table's own data structure is not considered. 
> Hash table inserts a new value, which will add a bucket. The size of a bucket 
> is 16 bytes.
> 2.When estimating the NDV of merge aggregation, if there are multiple 
> grouping exprs, it may be divided by the number of Fragment Instances several 
> times, and it should be divided only once.
> 3.When estimating the NDV of merge aggregation, and there are multiple 
> grouping exprs, the estimated memory is much smaller than the actual use.
> 4.If there is no grouping exprs, the estimated memory is much larger than the 
> actual use.
> 5.If the NDV of grouping exprs is very small, the estimated memory is much 
> larger than the actual use.
>  
> SortNode
> 1.Estimate the memory usage of external sort. the estimated memory is much 
> smaller than the actual use.
>  
>  
> HashJoinNode
> 1.The memory occupied by hash table's own data structure is not 
> considered.Hash Table will keep duplicate data, so the size of DuplicateNode 
> should be considered.
> 2.Hash table will create multiple buckets in advance. The size of these 
> buckets should be considered.
>  
> KuduScanNode
> 1.Estimate memory by scanning all columns,the estimated memory is much larger 
> than the actual use.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation

2020-12-04 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10377:
---
Target Version: Impala 4.0

> Improve the accuracy of resource estimation
> ---
>
> Key: IMPALA-10377
> URL: https://issues.apache.org/jira/browse/IMPALA-10377
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: liuyao
>Assignee: liuyao
>Priority: Major
>  Labels: estimate, memory, statistics
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> PlanNode does not consider some factors when estimating memory, this will 
> cause a large error rate
>  
> AggregationNode
>  
> 1.The memory occupied by hash table's own data structure is not considered. 
> Hash table inserts a new value, which will add a bucket. The size of a bucket 
> is 16 bytes.
> 2.When estimating the NDV of merge aggregation, if there are multiple 
> grouping exprs, it may be divided by the number of Fragment Instances several 
> times, and it should be divided only once.
> 3.When estimating the NDV of merge aggregation, and there are multiple 
> grouping exprs, the estimated memory is much smaller than the actual use.
> 4.If there is no grouping exprs, the estimated memory is much larger than the 
> actual use.
> 5.If the NDV of grouping exprs is very small, the estimated memory is much 
> larger than the actual use.
>  
> SortNode
> 1.Estimate the memory usage of external sort. the estimated memory is much 
> smaller than the actual use.
>  
>  
> HashJoinNode
> 1.The memory occupied by hash table's own data structure is not 
> considered.Hash Table will keep duplicate data, so the size of DuplicateNode 
> should be considered.
> 2.Hash table will create multiple buckets in advance. The size of these 
> buckets should be considered.
>  
> KuduScanNode
> 1.Estimate memory by scanning all columns,the estimated memory is much larger 
> than the actual use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation

2020-12-04 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10377:
---
Fix Version/s: (was: Impala 4.0)

> Improve the accuracy of resource estimation
> ---
>
> Key: IMPALA-10377
> URL: https://issues.apache.org/jira/browse/IMPALA-10377
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: liuyao
>Assignee: liuyao
>Priority: Major
>  Labels: estimate, memory, statistics
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> PlanNode does not consider some factors when estimating memory, this will 
> cause a large error rate
>  
> AggregationNode
>  
> 1.The memory occupied by hash table's own data structure is not considered. 
> Hash table inserts a new value, which will add a bucket. The size of a bucket 
> is 16 bytes.
> 2.When estimating the NDV of merge aggregation, if there are multiple 
> grouping exprs, it may be divided by the number of Fragment Instances several 
> times, and it should be divided only once.
> 3.When estimating the NDV of merge aggregation, and there are multiple 
> grouping exprs, the estimated memory is much smaller than the actual use.
> 4.If there is no grouping exprs, the estimated memory is much larger than the 
> actual use.
> 5.If the NDV of grouping exprs is very small, the estimated memory is much 
> larger than the actual use.
>  
> SortNode
> 1.Estimate the memory usage of external sort. the estimated memory is much 
> smaller than the actual use.
>  
>  
> HashJoinNode
> 1.The memory occupied by hash table's own data structure is not 
> considered.Hash Table will keep duplicate data, so the size of DuplicateNode 
> should be considered.
> 2.Hash table will create multiple buckets in advance. The size of these 
> buckets should be considered.
>  
> KuduScanNode
> 1.Estimate memory by scanning all columns,the estimated memory is much larger 
> than the actual use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation

[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation

[jira] [Updated] (IMPALA-10377) Improve the accuracy of resource estimation

3 matches

Site Navigation

Mail list logo

Footer information