[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-07-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=796720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796720
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 00:25
Start Date: 01/Aug/22 00:25
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3292: 
HIVE-25335: Unreasonable setting reduce number, when join big size ta…
URL: https://github.com/apache/hive/pull/3292




Issue Time Tracking
---

Worklog Id: (was: 796720)
Time Spent: 3h 40m  (was: 3.5h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=794663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794663
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 00:23
Start Date: 25/Jul/22 00:23
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3292:
URL: https://github.com/apache/hive/pull/3292#issuecomment-1193429305

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 794663)
Time Spent: 3.5h  (was: 3h 20m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=774386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774386
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 25/May/22 07:28
Start Date: 25/May/22 07:28
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #3292:
URL: https://github.com/apache/hive/pull/3292#issuecomment-1136889593

   @zhengchenyu I suspect that the error you see on Jenkins has to do with the 
fact that there are a lot of errors in the tests. 
   
   If you run locally and you use the `-Dtest.ouptut.overwrite` then you will 
not have any errors cause you are updating automatically the "reference files". 
If you want to see all the errors locally you must remove this parameter.
   
   Having said that if you commit all the changes in the reference files then 
tests most likely will pass and the Jenkins pipeline may run fine.




Issue Time Tracking
---

Worklog Id: (was: 774386)
Time Spent: 3h 20m  (was: 3h 10m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=774336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774336
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 25/May/22 03:22
Start Date: 25/May/22 03:22
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on PR #3292:
URL: https://github.com/apache/hive/pull/3292#issuecomment-1136682080

   > @zhengchenyu I am not sure what exactly do you mean by saying the unit 
tests are working in your environment. If you check the failed tests you will 
see a lot related to the `TestMiniLlapLocalCliDriver`. If you want to run these 
tests and update the plans you don't need Jenkins or anything else. You can do 
it by following the steps below:
   > 
   > ```
   > mvn clean install -DskipTests -Pitests
   > cd itests/qtest
   > mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite
   > ```
   > 
   > If you want to run specific tests then you can use the `-Dqfile` option. 
For more info have a look here: 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIruntheclientpositive/clientnegativeunittests?
   
   @zabetak 
   
   In our compile server, I just test the uni-test in this way. Then all pass.
   Seems jenkins fail on this scripts. 
   ```
# removes all stdout and err for passed tests
   xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 
'testsuite/testcase/system-err[count(../failure)=0]' `find . -name 'TEST*xml' 
-path '*/surefire-reports/*'`
   # remove all output.txt files
   find . -name '*output.txt' -path '*/surefire-reports/*' -exec unlink "{}" \\;
   ```
   
   I also execute this scripts, then pass. I don't know the difference between 
my compile server and this jenkins pipeline.
   
   Maybe I should setup a whole jenkins pipeline to reproduce this error. But 
jenkinsfile on my jenkins server run failed. 
   




Issue Time Tracking
---

Worklog Id: (was: 774336)
Time Spent: 3h 10m  (was: 3h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-05-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=771879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771879
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 18/May/22 13:40
Start Date: 18/May/22 13:40
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #3292:
URL: https://github.com/apache/hive/pull/3292#issuecomment-1130032347

   @zhengchenyu I am not sure what exactly do you mean by saying the unit tests 
are working in your environment. If you check the failed tests you will see a 
lot related to the `TestMiniLlapLocalCliDriver`. If you want to run these tests 
and update the plans you don't need Jenkins or anything else. You can do it by 
following the steps below:
   
   ```
   mvn clean install -DskipTests -Pitests
   cd itests/qtest
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite
   ```
   If you want to run specific tests then you can use the `-Dqfile` option. For 
more info have a look here: 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIruntheclientpositive/clientnegativeunittests?




Issue Time Tracking
---

Worklog Id: (was: 771879)
Time Spent: 3h  (was: 2h 50m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-05-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=771137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771137
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 17/May/22 04:25
Start Date: 17/May/22 04:25
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on PR #3292:
URL: https://github.com/apache/hive/pull/3292#issuecomment-1128396054

   @zabetak UT in my environment is right. Seems error happen in post stage.
   Because I change the logical of maxDataSize, so some explain output may 
changed. 
   Maybe many explain output should repair, so I need setup a jenkins pipeline. 
Is there any introducation about hive jenkins pipeline. Many problem happen 
when I setup the pipeline in my dev enviromnent.
   




Issue Time Tracking
---

Worklog Id: (was: 771137)
Time Spent: 2h 50m  (was: 2h 40m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=770442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-770442
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 14/May/22 00:25
Start Date: 14/May/22 00:25
Worklog Time Spent: 10m 
  Work Description: zhengchenyu opened a new pull request, #3292:
URL: https://github.com/apache/hive/pull/3292

   I found an application which is slow in our cluster, because the proccess 
bytes of one reduce is very huge, but only two reduce.
   when I debug, I found the reason. Because in this sql, one big size table 
(about 30G) with few row count(about 3.5M), another small size table (about 
100M) have more row count (about 3.6M). So JoinStatsRule.process only use 100M 
to estimate reducer's number. But we need to process 30G byte in fact.
   
   https://issues.apache.org/jira/browse/HIVE-25335




Issue Time Tracking
---

Worklog Id: (was: 770442)
Time Spent: 2h 40m  (was: 2.5h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-05-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=769681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769681
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 12/May/22 14:22
Start Date: 12/May/22 14:22
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-1125059076

   @zhengchenyu I don't think it's possible to reopen a closed PR. You can 
create a new one instead.




Issue Time Tracking
---

Worklog Id: (was: 769681)
Time Spent: 2.5h  (was: 2h 20m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754531
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 11:28
Start Date: 08/Apr/22 11:28
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on PR #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-1092763873

   @zabetak Sorry for miss this issue long time, Can you help me reopen this PR?




Issue Time Tracking
---

Worklog Id: (was: 754531)
Time Spent: 2h 20m  (was: 2h 10m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=754525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754525
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 08/Apr/22 11:22
Start Date: 08/Apr/22 11:22
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on PR #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-1092759199

   Sorry for miss it, reopen this PR.




Issue Time Tracking
---

Worklog Id: (was: 754525)
Time Spent: 2h 10m  (was: 2h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=725903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725903
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 14/Feb/22 02:27
Start Date: 14/Feb/22 02:27
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2490:
URL: https://github.com/apache/hive/pull/2490


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725903)
Time Spent: 2h  (was: 1h 50m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=725860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725860
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 14/Feb/22 00:13
Start Date: 14/Feb/22 00:13
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2490:
URL: https://github.com/apache/hive/pull/2490


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725860)
Time Spent: 1h 50m  (was: 1h 40m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=721717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-721717
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 07/Feb/22 00:12
Start Date: 07/Feb/22 00:12
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-1030948557


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 721717)
Time Spent: 1h 40m  (was: 1.5h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=692426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-692426
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 08/Dec/21 10:45
Start Date: 08/Dec/21 10:45
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-988700164


   Hey @zhengchenyu  the change looks reasonable to me. Are you planning to 
push this forward (update/check) the tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 692426)
Time Spent: 1.5h  (was: 1h 20m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-12-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=692166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-692166
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 08/Dec/21 00:13
Start Date: 08/Dec/21 00:13
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-988363846


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 692166)
Time Spent: 1h 20m  (was: 1h 10m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=662491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662491
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 08/Oct/21 03:55
Start Date: 08/Oct/21 03:55
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-938322395


   > @zhengchenyu I will try to review this next week. I see many test failures 
by the way. In order to merge this the tests should be green. Did you verify 
that all plan changes appearing there are beneficial?
   
   Okay, let me make the test green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 662491)
Time Spent: 1h 10m  (was: 1h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-10-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=658831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658831
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 09:20
Start Date: 01/Oct/21 09:20
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-932065576


   @zhengchenyu I will try to review this next week. I see many test failures 
by the way. In order to merge this the tests should be green. Did you verify 
that all plan changes appearing there are beneficial?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658831)
Time Spent: 1h  (was: 50m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=657675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-657675
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 00:14
Start Date: 30/Sep/21 00:14
Worklog Time Spent: 10m 
  Work Description: zhengchenyu edited a comment on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-927529430


   @jcamachor @zabetak  Can you help me review it, or give me some suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 657675)
Time Spent: 50m  (was: 40m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=657061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-657061
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 29/Sep/21 08:41
Start Date: 29/Sep/21 08:41
Worklog Time Spent: 10m 
  Work Description: zhengchenyu edited a comment on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-927529430


   @jcamachor @zabetak  Can you help me review it, or give me some suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 657061)
Time Spent: 40m  (was: 0.5h)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=655346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-655346
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 27/Sep/21 04:54
Start Date: 27/Sep/21 04:54
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-927529430


   @jcamachor Can you help me review it, or give me some suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 655346)
Time Spent: 0.5h  (was: 20m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=652507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652507
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 18/Sep/21 00:09
Start Date: 18/Sep/21 00:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2490:
URL: https://github.com/apache/hive/pull/2490#issuecomment-922142340


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 652507)
Time Spent: 20m  (was: 10m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=624216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624216
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 08:11
Start Date: 19/Jul/21 08:11
Worklog Time Spent: 10m 
  Work Description: zhengchenyu opened a new pull request #2490:
URL: https://github.com/apache/hive/pull/2490


   …ble(but small row count) and small size table
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624216)
Remaining Estimate: 0h
Time Spent: 10m

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)