[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-15004: --- Labels: auto-deprioritized-major auto-deprioritized-minor auto-unassigned pull-request-available (was: auto-deprioritized-major auto-unassigned pull-request-available stale-minor) Priority: Not a Priority (was: Minor) This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.9.1, 1.10.0 >Reporter: godfrey he >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor, > auto-unassigned, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-15004: --- Labels: auto-deprioritized-major auto-unassigned pull-request-available stale-minor (was: auto-deprioritized-major auto-unassigned pull-request-available) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized. > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.9.1, 1.10.0 >Reporter: godfrey he >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available, stale-minor > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-15004: --- Labels: auto-deprioritized-major auto-unassigned pull-request-available (was: auto-unassigned pull-request-available stale-major) Priority: Minor (was: Major) This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.9.1, 1.10.0 >Reporter: godfrey he >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-15004: --- Labels: auto-unassigned pull-request-available stale-major (was: auto-unassigned pull-request-available) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Major but is unassigned and neither itself nor its Sub-Tasks have been updated for 30 days. I have gone ahead and added a "stale-major" to the issue". If this ticket is a Major, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized. > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.9.1, 1.10.0 >Reporter: godfrey he >Priority: Major > Labels: auto-unassigned, pull-request-available, stale-major > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-15004: --- Labels: auto-unassigned pull-request-available (was: pull-request-available stale-assigned) > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.9.1, 1.10.0 >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: auto-unassigned, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-15004: --- Labels: pull-request-available stale-assigned (was: pull-request-available) > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.9.1, 1.10.0 >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available, stale-assigned > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Young updated FLINK-15004: --- Affects Version/s: (was: 1.9.0) 1.10.0 > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.10.0, 1.9.1 >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Young updated FLINK-15004: --- Parent: (was: FLINK-14133) Issue Type: Improvement (was: Sub-task) > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Young updated FLINK-15004: --- Fix Version/s: (was: 1.10.0) > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] godfrey he updated FLINK-15004: --- Description: Currently, blink planner will use default rowCount value (defined in {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is unknown, and maybe choose one-phase Aggregate. The job will hang if the data is skewed. So It's better to use two-phase Aggregate for execution stability if the statistics is unknown. (was: Currently, blink planner will use default rowCount value (defined in {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is unknown, and maybe choose {{HashJoin}} instead of {{SortMergeJoin}}. The job will hang if the build side of {{HashJoin}} has huge input size. So It's better to use {{SortMergeJoin}} for execution stability if the statistics is unknown.) > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose one-phase Aggregate. The job will hang if the data > is skewed. So It's better to use two-phase Aggregate for execution stability > if the statistics is unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15004) Choose two-phase Aggregate if the statistics is unknown
[ https://issues.apache.org/jira/browse/FLINK-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] godfrey he updated FLINK-15004: --- Summary: Choose two-phase Aggregate if the statistics is unknown (was: Choose SortMergeJoin instead of HashJoin if the statistics is unknown) > Choose two-phase Aggregate if the statistics is unknown > --- > > Key: FLINK-15004 > URL: https://issues.apache.org/jira/browse/FLINK-15004 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, blink planner will use default rowCount value (defined in > {{FlinkPreparingTableBase#DEFAULT_ROWCOUNT}} ) when the statistics is > unknown, and maybe choose {{HashJoin}} instead of {{SortMergeJoin}}. The job > will hang if the build side of {{HashJoin}} has huge input size. So It's > better to use {{SortMergeJoin}} for execution stability if the statistics is > unknown. -- This message was sent by Atlassian Jira (v8.3.4#803005)