[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=796720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796720 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 01/Aug/22 00:25 Start Date: 01/Aug/22 00:25 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #3292: HIVE-25335: Unreasonable setting reduce number, when join big size ta… URL: https://github.com/apache/hive/pull/3292 Issue Time Tracking --- Worklog Id: (was: 796720) Time Spent: 3h 40m (was: 3.5h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=794663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794663 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 25/Jul/22 00:23 Start Date: 25/Jul/22 00:23 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on PR #3292: URL: https://github.com/apache/hive/pull/3292#issuecomment-1193429305 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. Issue Time Tracking --- Worklog Id: (was: 794663) Time Spent: 3.5h (was: 3h 20m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=774386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774386 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 25/May/22 07:28 Start Date: 25/May/22 07:28 Worklog Time Spent: 10m Work Description: zabetak commented on PR #3292: URL: https://github.com/apache/hive/pull/3292#issuecomment-1136889593 @zhengchenyu I suspect that the error you see on Jenkins has to do with the fact that there are a lot of errors in the tests. If you run locally and you use the `-Dtest.ouptut.overwrite` then you will not have any errors cause you are updating automatically the "reference files". If you want to see all the errors locally you must remove this parameter. Having said that if you commit all the changes in the reference files then tests most likely will pass and the Jenkins pipeline may run fine. Issue Time Tracking --- Worklog Id: (was: 774386) Time Spent: 3h 20m (was: 3h 10m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=774336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774336 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 25/May/22 03:22 Start Date: 25/May/22 03:22 Worklog Time Spent: 10m Work Description: zhengchenyu commented on PR #3292: URL: https://github.com/apache/hive/pull/3292#issuecomment-1136682080 > @zhengchenyu I am not sure what exactly do you mean by saying the unit tests are working in your environment. If you check the failed tests you will see a lot related to the `TestMiniLlapLocalCliDriver`. If you want to run these tests and update the plans you don't need Jenkins or anything else. You can do it by following the steps below: > > ``` > mvn clean install -DskipTests -Pitests > cd itests/qtest > mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite > ``` > > If you want to run specific tests then you can use the `-Dqfile` option. For more info have a look here: https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIruntheclientpositive/clientnegativeunittests? @zabetak In our compile server, I just test the uni-test in this way. Then all pass. Seems jenkins fail on this scripts. ``` # removes all stdout and err for passed tests xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 'testsuite/testcase/system-err[count(../failure)=0]' `find . -name 'TEST*xml' -path '*/surefire-reports/*'` # remove all output.txt files find . -name '*output.txt' -path '*/surefire-reports/*' -exec unlink "{}" \\; ``` I also execute this scripts, then pass. I don't know the difference between my compile server and this jenkins pipeline. Maybe I should setup a whole jenkins pipeline to reproduce this error. But jenkinsfile on my jenkins server run failed. Issue Time Tracking --- Worklog Id: (was: 774336) Time Spent: 3h 10m (was: 3h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=771879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771879 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 18/May/22 13:40 Start Date: 18/May/22 13:40 Worklog Time Spent: 10m Work Description: zabetak commented on PR #3292: URL: https://github.com/apache/hive/pull/3292#issuecomment-1130032347 @zhengchenyu I am not sure what exactly do you mean by saying the unit tests are working in your environment. If you check the failed tests you will see a lot related to the `TestMiniLlapLocalCliDriver`. If you want to run these tests and update the plans you don't need Jenkins or anything else. You can do it by following the steps below: ``` mvn clean install -DskipTests -Pitests cd itests/qtest mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite ``` If you want to run specific tests then you can use the `-Dqfile` option. For more info have a look here: https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIruntheclientpositive/clientnegativeunittests? Issue Time Tracking --- Worklog Id: (was: 771879) Time Spent: 3h (was: 2h 50m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 3h > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=771137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771137 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 17/May/22 04:25 Start Date: 17/May/22 04:25 Worklog Time Spent: 10m Work Description: zhengchenyu commented on PR #3292: URL: https://github.com/apache/hive/pull/3292#issuecomment-1128396054 @zabetak UT in my environment is right. Seems error happen in post stage. Because I change the logical of maxDataSize, so some explain output may changed. Maybe many explain output should repair, so I need setup a jenkins pipeline. Is there any introducation about hive jenkins pipeline. Many problem happen when I setup the pipeline in my dev enviromnent. Issue Time Tracking --- Worklog Id: (was: 771137) Time Spent: 2h 50m (was: 2h 40m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=770442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-770442 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 14/May/22 00:25 Start Date: 14/May/22 00:25 Worklog Time Spent: 10m Work Description: zhengchenyu opened a new pull request, #3292: URL: https://github.com/apache/hive/pull/3292 I found an application which is slow in our cluster, because the proccess bytes of one reduce is very huge, but only two reduce. when I debug, I found the reason. Because in this sql, one big size table (about 30G) with few row count(about 3.5M), another small size table (about 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 100M to estimate reducer's number. But we need to process 30G byte in fact. https://issues.apache.org/jira/browse/HIVE-25335 Issue Time Tracking --- Worklog Id: (was: 770442) Time Spent: 2h 40m (was: 2.5h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=769681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769681 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 12/May/22 14:22 Start Date: 12/May/22 14:22 Worklog Time Spent: 10m Work Description: zabetak commented on PR #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-1125059076 @zhengchenyu I don't think it's possible to reopen a closed PR. You can create a new one instead. Issue Time Tracking --- Worklog Id: (was: 769681) Time Spent: 2.5h (was: 2h 20m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754531 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 08/Apr/22 11:28 Start Date: 08/Apr/22 11:28 Worklog Time Spent: 10m Work Description: zhengchenyu commented on PR #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-1092763873 @zabetak Sorry for miss this issue long time, Can you help me reopen this PR? Issue Time Tracking --- Worklog Id: (was: 754531) Time Spent: 2h 20m (was: 2h 10m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754525 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 08/Apr/22 11:22 Start Date: 08/Apr/22 11:22 Worklog Time Spent: 10m Work Description: zhengchenyu commented on PR #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-1092759199 Sorry for miss it, reopen this PR. Issue Time Tracking --- Worklog Id: (was: 754525) Time Spent: 2h 10m (was: 2h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=725903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725903 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 14/Feb/22 02:27 Start Date: 14/Feb/22 02:27 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2490: URL: https://github.com/apache/hive/pull/2490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 725903) Time Spent: 2h (was: 1h 50m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 2h > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=725860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725860 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 14/Feb/22 00:13 Start Date: 14/Feb/22 00:13 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2490: URL: https://github.com/apache/hive/pull/2490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 725860) Time Spent: 1h 50m (was: 1h 40m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=721717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721717 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 07/Feb/22 00:12 Start Date: 07/Feb/22 00:12 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-1030948557 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 721717) Time Spent: 1h 40m (was: 1.5h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=692426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-692426 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 08/Dec/21 10:45 Start Date: 08/Dec/21 10:45 Worklog Time Spent: 10m Work Description: zabetak commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-988700164 Hey @zhengchenyu the change looks reasonable to me. Are you planning to push this forward (update/check) the tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 692426) Time Spent: 1.5h (was: 1h 20m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=692166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-692166 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 08/Dec/21 00:13 Start Date: 08/Dec/21 00:13 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-988363846 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 692166) Time Spent: 1h 20m (was: 1h 10m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=662491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662491 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 08/Oct/21 03:55 Start Date: 08/Oct/21 03:55 Worklog Time Spent: 10m Work Description: zhengchenyu commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-938322395 > @zhengchenyu I will try to review this next week. I see many test failures by the way. In order to merge this the tests should be green. Did you verify that all plan changes appearing there are beneficial? Okay, let me make the test green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 662491) Time Spent: 1h 10m (was: 1h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=658831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658831 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 01/Oct/21 09:20 Start Date: 01/Oct/21 09:20 Worklog Time Spent: 10m Work Description: zabetak commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-932065576 @zhengchenyu I will try to review this next week. I see many test failures by the way. In order to merge this the tests should be green. Did you verify that all plan changes appearing there are beneficial? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 658831) Time Spent: 1h (was: 50m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 1h > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=657675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-657675 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 30/Sep/21 00:14 Start Date: 30/Sep/21 00:14 Worklog Time Spent: 10m Work Description: zhengchenyu edited a comment on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-927529430 @jcamachor @zabetak Can you help me review it, or give me some suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 657675) Time Spent: 50m (was: 40m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=657061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-657061 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 29/Sep/21 08:41 Start Date: 29/Sep/21 08:41 Worklog Time Spent: 10m Work Description: zhengchenyu edited a comment on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-927529430 @jcamachor @zabetak Can you help me review it, or give me some suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 657061) Time Spent: 40m (was: 0.5h) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 40m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=655346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-655346 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 27/Sep/21 04:54 Start Date: 27/Sep/21 04:54 Worklog Time Spent: 10m Work Description: zhengchenyu commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-927529430 @jcamachor Can you help me review it, or give me some suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 655346) Time Spent: 0.5h (was: 20m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=652507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652507 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 18/Sep/21 00:09 Start Date: 18/Sep/21 00:09 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2490: URL: https://github.com/apache/hive/pull/2490#issuecomment-922142340 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 652507) Time Spent: 20m (was: 10m) > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25335.001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table
[ https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=624216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624216 ] ASF GitHub Bot logged work on HIVE-25335: - Author: ASF GitHub Bot Created on: 19/Jul/21 08:11 Start Date: 19/Jul/21 08:11 Worklog Time Spent: 10m Work Description: zhengchenyu opened a new pull request #2490: URL: https://github.com/apache/hive/pull/2490 …ble(but small row count) and small size table ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 624216) Remaining Estimate: 0h Time Spent: 10m > Unreasonable setting reduce number, when join big size table(but small row > count) and small size table > -- > > Key: HIVE-25335 > URL: https://issues.apache.org/jira/browse/HIVE-25335 > Project: Hive > Issue Type: Improvement >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HIVE-25335.001.patch > > Time Spent: 10m > Remaining Estimate: 0h > > I found an application which is slow in our cluster, because the proccess > bytes of one reduce is very huge, but only two reduce. > when I debug, I found the reason. Because in this sql, one big size table > (about 30G) with few row count(about 3.5M), another small size table (about > 100M) have more row count (about 3.6M). So JoinStatsRule.process only use > 100M to estimate reducer's number. But we need to process 30G byte in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005)