[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553213#comment-16553213 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12932725/HIVE-20032.8.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 14681 tests executed *Failed tests:* {noformat} org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testCancelRenewTokenFlow (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testRenewDelegationToken (batchId=264) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=264) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12793/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12793/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12793/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12932725 - PreCommit-HIVE-Build > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, > HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553135#comment-16553135 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 36s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 40s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 18s{color} | {color:blue} ql in master has 2280 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 26s{color} | {color:blue} spark-client in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 14s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 1s{color} | {color:red} ql in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 14s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 14s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s{color} | {color:red} itests/hive-unit: The patch generated 2 new + 20 unchanged - 0 fixed = 22 total (was 20) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s{color} | {color:red} kryo-registrator: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s{color} | {color:red} ql: The patch generated 2 new + 16 unchanged - 0 fixed = 18 total (was 16) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s{color} | {color:red} spark-client: The patch generated 1 new + 27 unchanged - 0 fixed = 28 total (was 27) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 14s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12793/dev-support/hive-personality.sh | | git revision | master / bed17e5 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | mvninstall | http://104.198.109.242/logs//PreCommit-HIVE-Build-12793/yetus/patch-mvninstall-kryo-registrator.txt | | mvninstall | http://104.198.109.242/logs//PreCommit-HIVE-Build-12793/yetus/patch-mvninstall-ql.txt | | compile | http://104.198.109.242/logs//PreCommit-HIVE-Build-12793/yetus/patch-c
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552984#comment-16552984 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12932706/HIVE-20032.7.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 14681 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testDescribeErasureCoding (batchId=251) org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testExplainErasureCoding (batchId=251) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12792/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12792/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12792/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12932706 - PreCommit-HIVE-Build > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, > HIVE-20032.6.patch, HIVE-20032.7.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552957#comment-16552957 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 35s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 41s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 5s{color} | {color:blue} ql in master has 2280 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 25s{color} | {color:blue} spark-client in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 14s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 54s{color} | {color:red} ql in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 13s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 13s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s{color} | {color:red} itests/hive-unit: The patch generated 2 new + 20 unchanged - 0 fixed = 22 total (was 20) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s{color} | {color:red} kryo-registrator: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s{color} | {color:red} ql: The patch generated 2 new + 16 unchanged - 0 fixed = 18 total (was 16) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} spark-client: The patch generated 1 new + 27 unchanged - 0 fixed = 28 total (was 27) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 13s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12792/dev-support/hive-personality.sh | | git revision | master / bed17e5 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | mvninstall | http://104.198.109.242/logs//PreCommit-HIVE-Build-12792/yetus/patch-mvninstall-kryo-registrator.txt | | mvninstall | http://104.198.109.242/logs//PreCommit-HIVE-Build-12792/yetus/patch-mvninstall-ql.txt | | compile | http://104.198.109.242/logs//PreCommit-HIVE-Build-12792/yetus/patch-c
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551524#comment-16551524 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 72m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 37s{color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 44s{color} | {color:red} branch/common cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} branch/kryo-registrator cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 5m 55s{color} | {color:red} branch/ql cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s{color} | {color:red} branch/spark-client cannot run setBugDatabaseInfo from findbugs {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 17s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 37s{color} | {color:red} ql in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 20s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 20s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} kryo-registrator: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 54s{color} | {color:red} ql: The patch generated 2 new + 16 unchanged - 0 fixed = 18 total (was 16) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 31s{color} | {color:red} patch/common cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 13s{color} | {color:red} kryo-registrator in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 4m 2s{color} | {color:red} patch/ql cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s{color} | {color:red} patch/spark-client cannot run setBugDatabaseInfo from findbugs {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}103m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12750/dev-support/hive-personality.sh | | git revision | master / e569ef0 | | Default Java | 1.8.0_111 | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-12750/yetus/branch-findbugs-common.txt | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-12750/yetus/branch-findbugs-kryo-re
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551517#comment-16551517 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12932436/HIVE-20032.6.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 14680 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.exec.spark.TestSparkStatistics.testSparkStatistics (batchId=242) org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testSparkQuery (batchId=253) org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable (batchId=253) org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testDescribeErasureCoding (batchId=251) org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testExplainErasureCoding (batchId=251) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=253) org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles (batchId=316) org.apache.hive.spark.client.TestSparkClient.testCounters (batchId=316) org.apache.hive.spark.client.TestSparkClient.testErrorJob (batchId=316) org.apache.hive.spark.client.TestSparkClient.testErrorJobNotSerializable (batchId=316) org.apache.hive.spark.client.TestSparkClient.testJobSubmission (batchId=316) org.apache.hive.spark.client.TestSparkClient.testMetricsCollection (batchId=316) org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob (batchId=316) org.apache.hive.spark.client.TestSparkClient.testSyncRpc (batchId=316) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12750/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12750/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12750/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12932436 - PreCommit-HIVE-Build > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, HIVE-20032.6.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550959#comment-16550959 ] Sahil Takiar commented on HIVE-20032: - Attached an updated patch that should fix the unit tests. Since the new approach is using reflection to create a class in the {{kryo-registrator}} module, we have to make sure that {{hive-kryo-registrator}} jar file is on the current classpath. Apparently the {{--jar}} option in {{spark-submit}} does not guarantee that for the Spark Driver. So I'm using {{spark.driver.extraClassPath}} to make sure its present when running the {{SparkPlanGenerator}}. When {{spark.master}} is {{local}} things are a bit tricker because there is no separate driver process. So instead I use {{SparkClientUtilities#addJarToContextLoader}} to add the jar to the current classpath. The constructor for {{LocalHiveSparkClient}} is already doing this, but for whatever reason the classloader is getting reset by the time {{SparkPlanGenerator}} runs. I tried debugging it, but didn't have any luck, and given that its just a test infra issue it doesn't seem work resolving unless anyone has suggestions on how to fix it. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, HIVE-20032.6.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549654#comment-16549654 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12932183/HIVE-20032.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 631 failed/errored test(s), 14672 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestLocalSparkCliDriver.testCliDriver[spark_local_queries] (batchId=264) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[add_part_multiple] (batchId=140) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[alter_merge_orc] (batchId=138) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[alter_merge_stats_orc] (batchId=126) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[annotate_stats_join] (batchId=132) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join0] (batchId=148) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join10] (batchId=124) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join11] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join12] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join13] (batchId=145) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join14] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join15] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join16] (batchId=126) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join17] (batchId=146) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join18] (batchId=113) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join18_multi_distinct] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join19] (batchId=137) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join1] (batchId=144) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join20] (batchId=149) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join21] (batchId=146) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join22] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join23] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join24] (batchId=142) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join26] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join27] (batchId=149) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join28] (batchId=140) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join29] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join2] (batchId=137) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join30] (batchId=122) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join31] (batchId=129) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join32] (batchId=147) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join3] (batchId=146) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join4] (batchId=140) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join5] (batchId=142) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join6] (batchId=147) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join7] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join8] (batchId=147) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join9] (batchId=143) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_filters] (batchId=135) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_nulls] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_reordering_values] (batchId=111) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_stats2] (batchId=148) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_stats] (batchId=130) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_without_localtask] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_smb_mapjoin_14] (batchId=135) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_12] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_13] (batchId=137) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_14] (batchId=113) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549629#comment-16549629 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 53s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 35s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 28s{color} | {color:blue} spark-client in master has 10 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 14s{color} | {color:blue} ql in master has 2273 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s{color} | {color:red} spark-client: The patch generated 1 new + 12 unchanged - 0 fixed = 13 total (was 12) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 2 new + 12 unchanged - 0 fixed = 14 total (was 12) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s{color} | {color:red} kryo-registrator: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12700/dev-support/hive-personality.sh | | git revision | master / 6d15ce4 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12700/yetus/diff-checkstyle-spark-client.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12700/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12700/yetus/diff-checkstyle-kryo-registrator.txt | | modules | C: common spark-client ql kryo-registrator U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-12700/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 >
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548704#comment-16548704 ] Sahil Takiar commented on HIVE-20032: - [~lirui] attached an updated batch that preserves the Kryo shading. It moves the new serializer to the {{kryo-registrator}} module and uses reflection to instantiate the class. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1654#comment-1654 ] Rui Li commented on HIVE-20032: --- Hi [~stakiar], kryo was relocated not just because Spark uses a different version. My concern is if we remove the relocation, it may break user's applications that depend on Hive. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543633#comment-16543633 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12931417/HIVE-20032.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 14650 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.exec.spark.TestSparkStatistics.testSparkStatistics (batchId=241) org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testSparkQuery (batchId=252) org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable (batchId=252) org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testDescribeErasureCoding (batchId=250) org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testExplainErasureCoding (batchId=250) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=252) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12590/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12590/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12590/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12931417 - PreCommit-HIVE-Build > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543615#comment-16543615 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 41s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 0s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 5m 57s{color} | {color:blue} ql in master has 2289 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s{color} | {color:red} ql: The patch generated 1 new + 12 unchanged - 0 fixed = 13 total (was 12) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile xml | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12590/dev-support/hive-personality.sh | | git revision | master / d8306cf | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12590/yetus/diff-checkstyle-ql.txt | | modules | C: common ql kryo-registrator U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-12590/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542413#comment-16542413 ] Sahil Takiar commented on HIVE-20032: - As for benchmarking, I have done a lot of TPC-DS benchmarking, and I don't consistently get better performance. However, the amount of shuffled data is significantly reduced (as well as the amount of data spilled to disk). My guess is that latency doesn't improve much because I'm running my tests on a unloaded cluster. However, I expect cluster throughput to be better with this patch since less I/O resources are being used. I'll need to run some concurrent TPC-DS workloads to confirm this though. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542409#comment-16542409 ] Sahil Takiar commented on HIVE-20032: - [~lirui] thanks for taking a look. So I took a closer look at this, and I think there might be a way to specify custom serializers just for shuffles. However, it require accessing some lower-level Spark APIs. The idea is that RDD operations such as {{SortByKey}} and {{repartitionAndSortWithinPartitions}} return a {{ShuffledRDD}}. The {{ShuffledRDD}} object has a method called {{setSerializer}} that allows users to set a custom serializer for that RDD. Certain RDD APIs such as {{combineByKey}} expose setting a custom serializer via invoking the {{ShuffledRDD#setSerializer}} method, however, it doesn't look like {{sortByKey}} or {{repartitionAndSortWithinPartitions}} does. I think this is probably better than my original approach. The other issue is that specifying a customer serializer doesn't work with the way we currently shade Kryo in {{hive-exec}} (I think you found similar issues while working on HIVE-15104). So I had to remove the relocation for Kryo (which was added in HIVE-5915). Hopefully thats ok since Spark and Hive use the same version of Kryo. I attached an updated patch (still a WIP) that implements this approach. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541466#comment-16541466 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12931183/HIVE-20032.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 14649 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12549/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12549/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12549/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12931183 - PreCommit-HIVE-Build > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541444#comment-16541444 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 23s{color} | {color:blue} spark-client in master has 10 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 41s{color} | {color:blue} ql in master has 2287 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 8s{color} | {color:red} kryo-registrator: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12549/dev-support/hive-personality.sh | | git revision | master / 5ade740 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12549/yetus/diff-checkstyle-kryo-registrator.txt | | modules | C: common spark-client ql kryo-registrator U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-12549/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541076#comment-16541076 ] Rui Li commented on HIVE-20032: --- Hi [~stakiar], thanks for working on this. There're other cases where RDD is cached, e.g. parallel order by. So you need to serialize the hash code in all these cases (maybe multi-insert is another one). Having separate SerDe for caching and shuffling would be good. But I guess that needs help from Spark side. And btw, have you run benchmarks to get the improvements of this change? > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540257#comment-16540257 ] Sahil Takiar commented on HIVE-20032: - [~lirui] could you take a look? This patch also turns {{hive.spark.optimize.shuffle.serde}} on by default. I think we should try to get to a point where we never have to serialize the hashCode. It's confusing to users migrating from Hive-on-MR to HoS when they see a query that requires more shuffle data in HoS than Hive-on-MR. This is the first step towards achieving that. Doing it completely will be tricky. Off the top of my head, we will need a way to specify separate serializers for cacheing RDDs vs. shuffling them. We will also need a way to preserve the hashCode for {{groupByKey}}. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535919#comment-16535919 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12930601/HIVE-20032.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12451/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12451/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12451/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Tests exited with: Exception: Patch URL https://issues.apache.org/jira/secure/attachment/12930601/HIVE-20032.2.patch was found in seen patch url's cache and a test was probably run already on it. Aborting... {noformat} This message is automatically generated. ATTACHMENT ID: 12930601 - PreCommit-HIVE-Build > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535918#comment-16535918 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12930601/HIVE-20032.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 361 failed/errored test(s), 14628 tests executed *Failed tests:* {noformat} TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_dynamic_partition] (batchId=190) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_expressions] (batchId=190) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test1] (batchId=190) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_alter] (batchId=190) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_insert] (batchId=190) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[dynamic_rdd_cache] (batchId=185) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_map_operators] (batchId=186) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_num_buckets] (batchId=185) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_constprog_dpp] (batchId=186) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=184) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_2] (batchId=186) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_3] (batchId=186) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_4] (batchId=186) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_6] (batchId=185) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_7] (batchId=185) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=185) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_in_process_launcher] (batchId=184) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_multi_insert_parallel_orderby] (batchId=186) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_opt_shuffle_serde] (batchId=185) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats] (batchId=184) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=184) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1] (batchId=185) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2] (batchId=184) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join0] (batchId=147) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join10] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join11] (batchId=111) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join12] (batchId=118) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join13] (batchId=144) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join15] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join16] (batchId=125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join18] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join18_multi_distinct] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join20] (batchId=148) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join22] (batchId=132) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join24] (batchId=141) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join26] (batchId=113) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join27] (batchId=148) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join30] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join31] (batchId=128) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_smb_mapjoi
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535902#comment-16535902 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 5s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 32s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 52s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 56s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 40s{color} | {color:blue} spark-client in master has 10 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 6m 13s{color} | {color:blue} ql in master has 2287 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s{color} | {color:red} kryo-registrator: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 44m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12450/dev-support/hive-personality.sh | | git revision | master / 406cde9 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12450/yetus/diff-checkstyle-kryo-registrator.txt | | modules | C: common spark-client ql kryo-registrator U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-12450/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535353#comment-16535353 ] Sahil Takiar commented on HIVE-20032: - Attaching dummy patch with {{hive.spark.optimize.shuffle.serde}} set to true by default and {{hive.combine.equivalent.work.optimization}} to false by default, just to see if there are any HoS test failures (besides explain plan diffs). > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528971#comment-16528971 ] Hive QA commented on HIVE-20032: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12929651/HIVE-20032.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 14635 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12290/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12290/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12290/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12929651 - PreCommit-HIVE-Build > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20032) Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled
[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528965#comment-16528965 ] Hive QA commented on HIVE-20032: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 23s{color} | {color:blue} spark-client in master has 10 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 47s{color} | {color:blue} ql in master has 2287 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s{color} | {color:red} kryo-registrator: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12290/dev-support/hive-personality.sh | | git revision | master / 80c3bb5 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12290/yetus/diff-checkstyle-kryo-registrator.txt | | modules | C: spark-client ql kryo-registrator U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-12290/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > - > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-20032.1.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)