[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604008#comment-16604008 ] Ashutosh Chauhan commented on HIVE-20491: - Ok. As it currently stands we always estimate assuming fast hashtable and always use it. What we will miss out on is if estimate is high we will turn off BJ altogether instead of going with more memory efficient optimized version of hashtable. I agree we can take this improvement in follow-up. +1 > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, > HIVE-20491.02.patch > > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603259#comment-16603259 ] Zoltan Haindrich commented on HIVE-20491: - yes; I was planning to do that - right now this patch only fixes the estimations; and sets the most conservative estimate(fast3) by default. I'm still working on deciding and forwarding that decision...right now I think that will not be in this patch > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, > HIVE-20491.02.patch > > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603246#comment-16603246 ] Ashutosh Chauhan commented on HIVE-20491: - [~kgyrtkirk] Selection of kind of hashtable is done by Vectorizer which runs *after* ConvertJoinMapJoin which does algo selection. I see you have updated size computation assuming fast hashtable but wont it better that we first do memory computation using optimized version and then using fast. If fast qualifies set that in Join so that vectorizer can pick correct hashtable type? Though, since fast hashtables are bigger current approach also works though its more conservative than needed. > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, > HIVE-20491.02.patch > > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603216#comment-16603216 ] Hive QA commented on HIVE-20491: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12938251/HIVE-20491.02.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 14923 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/13583/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13583/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13583/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12938251 - PreCommit-HIVE-Build > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, > HIVE-20491.02.patch > > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603168#comment-16603168 ] Hive QA commented on HIVE-20491: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 8s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 3 new + 35 unchanged - 3 fixed = 38 total (was 38) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-13583/dev-support/hive-personality.sh | | git revision | master / 3287a09 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-13583/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-13583/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, > HIVE-20491.02.patch > > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602318#comment-16602318 ] Hive QA commented on HIVE-20491: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12938137/HIVE-20491.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 14922 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_llap] (batchId=169) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=164) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/13570/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13570/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13570/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12938137 - PreCommit-HIVE-Build > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch > > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602296#comment-16602296 ] Hive QA commented on HIVE-20491: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 11s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 3 new + 35 unchanged - 3 fixed = 38 total (was 38) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-13570/dev-support/hive-personality.sh | | git revision | master / a4dd84b | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-13570/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-13570/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch > > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
[ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598822#comment-16598822 ] Zoltan Haindrich commented on HIVE-20491: - I've looked into this a little bit more...and there seems to be 2 core fast implementations and 1 optimized. There is a Fast implementation for long keys; and another which for string/other keys; in case of these complex keys the optimized implementation handles memory better. > Fix mapjoin size estimations for Fast implementation > > > Key: HIVE-20491 > URL: https://issues.apache.org/jira/browse/HIVE-20491 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > HIVE-19824 have fixed the estimations; but it calculated for the "optimized" > impl; the "fast" one has a little bit bigger footprint. > It also seems like fast is a bit overestimated at runtime...that should be > also taken care of. > | numkeys | implementation | compiler estimation | runtime estimation | > runtime measurement | ce / rm | re / rm | > | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 | > | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 | -- This message was sent by Atlassian JIRA (v7.6.3#76005)