[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-09-04 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604008#comment-16604008
 ] 

Ashutosh Chauhan commented on HIVE-20491:
-

Ok. As it currently stands we always estimate assuming fast hashtable and 
always use it. 
What we will miss out on is if estimate is high we will turn off BJ altogether 
instead of going with more memory efficient optimized version of hashtable. I 
agree we can take this improvement in follow-up.
+1

> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, 
> HIVE-20491.02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-09-04 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603259#comment-16603259
 ] 

Zoltan Haindrich commented on HIVE-20491:
-

yes; I was planning to do that - right now this patch only fixes the 
estimations; and sets the most conservative estimate(fast3) by default.

I'm still working on deciding and forwarding that decision...right now I think 
that will not be in this patch

> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, 
> HIVE-20491.02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-09-04 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603246#comment-16603246
 ] 

Ashutosh Chauhan commented on HIVE-20491:
-

[~kgyrtkirk] Selection of kind of hashtable is done by Vectorizer which runs 
*after* ConvertJoinMapJoin which does algo selection. I see you have updated 
size computation assuming fast hashtable but wont it better that we first do 
memory computation using optimized version and then using fast. If fast 
qualifies set that in Join so that vectorizer can pick correct hashtable type?
Though, since fast hashtables are bigger current approach also works though its 
more conservative than needed.

> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, 
> HIVE-20491.02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603216#comment-16603216
 ] 

Hive QA commented on HIVE-20491:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12938251/HIVE-20491.02.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14923 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13583/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13583/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13583/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12938251 - PreCommit-HIVE-Build

> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, 
> HIVE-20491.02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-09-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603168#comment-16603168
 ] 

Hive QA commented on HIVE-20491:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 3 new + 35 unchanged - 3 fixed 
= 38 total (was 38) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13583/dev-support/hive-personality.sh
 |
| git revision | master / 3287a09 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13583/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13583/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, 
> HIVE-20491.02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-09-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602318#comment-16602318
 ] 

Hive QA commented on HIVE-20491:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12938137/HIVE-20491.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 14922 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_llap] 
(batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=164)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13570/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13570/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13570/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12938137 - PreCommit-HIVE-Build

> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-09-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602296#comment-16602296
 ] 

Hive QA commented on HIVE-20491:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
11s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 3 new + 35 unchanged - 3 fixed 
= 38 total (was 38) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13570/dev-support/hive-personality.sh
 |
| git revision | master / a4dd84b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13570/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13570/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation

2018-08-31 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598822#comment-16598822
 ] 

Zoltan Haindrich commented on HIVE-20491:
-

I've looked into this a little bit more...and there seems to be 2 core fast 
implementations and 1 optimized.

There is a Fast implementation for long keys; and another which for 
string/other keys; in case of these complex keys the optimized implementation 
handles memory better.

> Fix mapjoin size estimations for Fast implementation
> 
>
> Key: HIVE-20491
> URL: https://issues.apache.org/jira/browse/HIVE-20491
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" 
> impl; the "fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be 
> also taken care of.
> | numkeys | implementation | compiler estimation | runtime estimation | 
> runtime measurement | ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)