[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-23 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553485#comment-16553485
 ] 

Sahil Takiar commented on HIVE-20032:
-

[~lirui] HoS tests are passing now, and I've managed to preserve the Kryo 
shading. Created an RB: https://reviews.apache.org/r/68026/ - could you take a 
look?

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-24 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554183#comment-16554183
 ] 

Rui Li commented on HIVE-20032:
---

Hi [~stakiar], I left some comments on RB. Meanwhile, could you explain why we 
need to put the registrator jar in driver's extra class path? Won't {{--jars}} 
add the jar to both driver and executor's class paths? And IIUC, driver extra 
class path only takes jars on local FS. So will that be a problem for cluster 
mode?

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-24 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554502#comment-16554502
 ] 

Sahil Takiar commented on HIVE-20032:
-

I originally thought that {{\-\-jars}} would add the specified jars to the 
executor and driver class path, but apparently thats not the case. According to 
https://stackoverflow.com/questions/37132559/add-jars-to-a-spark-job-spark-submit?answertab=votes#tab-top
 you have to add the jars manually. It seems that {{\-\-jars}} will just copy 
the jar file to the local fs of each executor (similar to MapReduce's 
DistributedCache), but the jar has to be explicitly added to the driver / 
executor classpath.

I think HIVE-15104 works because there isn't any Hive code that is calling a 
class inside {{hive-kryo-registrator}}. It's only the Spark code. If you look 
at the code in Spark's {{KryoSerializer}} you see that Spark sets up some 
special classloader before calling the class specified in 
{{spark.kryo.registrator}}.

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, HIVE-20032.9.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554954#comment-16554954
 ] 

Hive QA commented on HIVE-20032:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
47s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
36s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
42s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
26s{color} | {color:blue} ql in master has 2280 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} spark-client in master has 10 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
13s{color} | {color:red} kryo-registrator in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
57s{color} | {color:red} ql in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
13s{color} | {color:red} kryo-registrator in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 13s{color} 
| {color:red} kryo-registrator in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
16s{color} | {color:red} itests/hive-unit: The patch generated 2 new + 20 
unchanged - 0 fixed = 22 total (was 20) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m  
9s{color} | {color:red} kryo-registrator: The patch generated 1 new + 2 
unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 10 new + 16 unchanged - 0 
fixed = 26 total (was 16) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} spark-client: The patch generated 1 new + 27 unchanged 
- 0 fixed = 28 total (was 27) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
14s{color} | {color:red} kryo-registrator in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
22s{color} | {color:red} ql generated 1 new + 2280 unchanged - 0 fixed = 2281 
total (was 2280) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Possible doublecheck on 
org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.INSTANCE in 
org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.getInstance(JavaSparkContext,
 Configuration)  At 
ShuffleKryoSerializer.java:org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.getInstance(JavaSparkContext,
 Configuration)  At ShuffleKryoSerializer.java:[lines 42-44] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hivep

[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554977#comment-16554977
 ] 

Hive QA commented on HIVE-20032:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12932914/HIVE-20032.9.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 14685 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_dynamic_partition]
 (batchId=192)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_expressions]
 (batchId=192)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test1]
 (batchId=192)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_alter]
 (batchId=192)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_insert]
 (batchId=192)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12834/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12834/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12834/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12932914 - PreCommit-HIVE-Build

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, HIVE-20032.9.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-25 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1612#comment-1612
 ] 

Rui Li commented on HIVE-20032:
---

I run a simple query in yarn-cluster mode with patch v8 and hit an issue:
{noformat}
2018-07-25T17:58:05,859 ERROR [6f7f3077-05bf-45cc-bf32-4c65132ccf48 main] 
status.SparkJobMonitor: Spark job[-1] failed
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/HiveKey
at 
org.apache.hive.spark.HiveKryoRegistrator.registerClasses(HiveKryoRegistrator.java:37)
 ~[hive-kryo-registrator-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$6.apply(KryoSerializer.scala:136)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$6.apply(KryoSerializer.scala:136)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.11.8.jar:?]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) 
~[scala-library-2.11.8.jar:?]
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:136) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:324)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:309)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:218)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:288)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:127)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1481) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.HadoopRDD.(HadoopRDD.scala:117) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:997)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.SparkContext$$anonfun$hadoopRDD$1.apply(SparkContext.scala:988)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.SparkContext.withScope(SparkContext.scala:692) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:988) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:416)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:239)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:176)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:127)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:361)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_151]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_151]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.ql.io.HiveKey
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
~[?:1.8.0_151]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_151]
   

[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-25 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555466#comment-16555466
 ] 

Rui Li commented on HIVE-20032:
---

bq. I originally thought that --jars would add the specified jars to the 
executor and driver class path, but apparently thats not the case.
Is this a Spark issue? Because according to the 
[docs|https://github.com/apache/spark/blob/v2.3.0/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L525],
 the jars should be added to CP. It seems ApplicationMaster uses a [custom 
class 
loader|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L152]
 for the user class, which should load the jars added by {{--jars}}.
A possible cause is that the jars are usually [not added to system class 
loader|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1273].
 Sometimes that can give you ClassNotFoundException even when the jars are 
there -- you just need to use the correct class loader.

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, HIVE-20032.9.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-25 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555989#comment-16555989
 ] 

Sahil Takiar commented on HIVE-20032:
-

Yeah, looks like the issue was with using the correct classloader. I think 
{{Class.forName}} uses the system classloader rather than the 
{{Thread.currentThread().getContextClassLoader()}} classloader. After switching 
to the thread context class loader, things started to work and we don't have to 
manually add the jar to the driver classpath anymore. Attached an updated patch.

Can you re-run your test with the latest patch (assuming Hive QA comes back 
with a clean run)? I removed all the driver classpath logic I had added; 
hopefully that fixed the issue.

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, 
> HIVE-20032.9.patch, HIVE-20032.91.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556287#comment-16556287
 ] 

Hive QA commented on HIVE-20032:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
42s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
3s{color} | {color:blue} ql in master has 2296 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} spark-client in master has 10 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
13s{color} | {color:red} kryo-registrator in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
17s{color} | {color:red} itests/hive-unit: The patch generated 2 new + 20 
unchanged - 0 fixed = 22 total (was 20) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m  
9s{color} | {color:red} kryo-registrator: The patch generated 1 new + 2 
unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
39s{color} | {color:red} ql: The patch generated 13 new + 12 unchanged - 0 
fixed = 25 total (was 12) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
18s{color} | {color:red} ql generated 1 new + 2296 unchanged - 0 fixed = 2297 
total (was 2296) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Possible doublecheck on 
org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.INSTANCE in 
org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.getInstance(JavaSparkContext,
 Configuration)  At 
ShuffleKryoSerializer.java:org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.getInstance(JavaSparkContext,
 Configuration)  At ShuffleKryoSerializer.java:[lines 42-44] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-12859/dev-support/hive-personality.sh
 |
| git revision | master / 68bdf9e |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| mvninstall | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12859/yetus/patch-mvninstall-kryo-registrator.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Buil

[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556313#comment-16556313
 ] 

Hive QA commented on HIVE-20032:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12933072/HIVE-20032.91.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 14811 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.exec.spark.TestHiveSparkClient.testSetJobGroupAndDescription
 (batchId=305)
org.apache.hadoop.hive.ql.exec.spark.TestSparkPlan.testSetRDDCallSite 
(batchId=306)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12859/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12859/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12859/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12933072 - PreCommit-HIVE-Build

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, 
> HIVE-20032.9.patch, HIVE-20032.91.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-26 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558085#comment-16558085
 ] 

Rui Li commented on HIVE-20032:
---

[~stakiar], thanks for update. I can run my queries with latest patch.
For the failed tests, I think you can manually add the registrator jar, or 
disable the feature since it's already tested in qtest.

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, 
> HIVE-20032.9.patch, HIVE-20032.91.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558538#comment-16558538
 ] 

Hive QA commented on HIVE-20032:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 39m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
55s{color} | {color:blue} ql in master has 2296 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} | {color:blue} spark-client in master has 10 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
12s{color} | {color:red} kryo-registrator in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
16s{color} | {color:red} itests/hive-unit: The patch generated 2 new + 20 
unchanged - 0 fixed = 22 total (was 20) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m  
9s{color} | {color:red} kryo-registrator: The patch generated 1 new + 2 
unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 15 new + 14 unchanged - 0 
fixed = 29 total (was 14) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m  
6s{color} | {color:red} ql generated 1 new + 2296 unchanged - 0 fixed = 2297 
total (was 2296) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Possible doublecheck on 
org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.INSTANCE in 
org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.getInstance(JavaSparkContext,
 Configuration)  At 
ShuffleKryoSerializer.java:org.apache.hadoop.hive.ql.exec.spark.ShuffleKryoSerializer.getInstance(JavaSparkContext,
 Configuration)  At ShuffleKryoSerializer.java:[lines 42-44] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-12877/dev-support/hive-personality.sh
 |
| git revision | master / 2820fc4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| mvninstall | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12877/yetus/patch-mvninstall-kryo-registrator.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Buil

[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558556#comment-16558556
 ] 

Hive QA commented on HIVE-20032:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12933198/HIVE-20032.92.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14812 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12877/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12877/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12877/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12933198 - PreCommit-HIVE-Build

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, 
> HIVE-20032.9.patch, HIVE-20032.91.patch, HIVE-20032.92.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

2018-07-26 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559162#comment-16559162
 ] 

Rui Li commented on HIVE-20032:
---

+1

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---
>
> Key: HIVE-20032
> URL: https://issues.apache.org/jira/browse/HIVE-20032
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, 
> HIVE-20032.9.patch, HIVE-20032.91.patch, HIVE-20032.92.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)