Re: [VOTE] Apache Hive 2.3.10 Release Candidate 1

2024-05-07 Thread Rui Li
+1 (binding)

- Built from source
- Verified signature and checksum
- Ran some basic SQL statements
- Verified the fix made in https://github.com/apache/hive/pull/5204

Thanks Chao for driving this.

On Tue, May 7, 2024 at 2:26 AM Dongjoon Hyun  wrote:

> +1 (non-binding)
>
> Thank you so much all!
>
> Dongjoon.
>
> On 2024/05/06 13:57:09 Cheng Pan wrote:
> > +1 (non-binding)
> >
> > Pass integration test with Apache Spark[1] and Apache Kyuubi[2].
> >
> > [1] https://github.com/apache/spark/pull/45372
> > [2] https://github.com/apache/kyuubi/pull/6328
> >
> > Thanks,
> > Cheng Pan
> >
> >
> >
>


-- 
Cheers,
Rui Li


Review Request 25774: Support merging small files

2014-09-18 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25774/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8043
https://issues.apache.org/jira/browse/HIVE-8043


Repository: hive-git


Description
---

Support merging files for spark.
For non-rc files, the merging task is simply a MapWork.
For RC/Orc files, the merging task is a MergeFileWork. And 
SparkMergeFileRecordHandler is added to handle it.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 5078a3a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java 
c54bffe 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
2537789 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
9b11fe4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
3eea26a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
94ebcdd 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java b0a9407 
  ql/src/test/queries/clientpositive/disable_merge_for_bucketing.q 471d296 
  ql/src/test/queries/clientpositive/merge1.q c7249af 
  ql/src/test/queries/clientpositive/merge2.q bb86dc2 
  ql/src/test/results/clientpositive/spark/merge1.q.out 772984d 
  ql/src/test/results/clientpositive/spark/merge2.q.out 8d8dcb8 

Diff: https://reviews.apache.org/r/25774/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 25774: Support merging small files

2014-09-18 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25774/
---

(Updated Sept. 18, 2014, 12:33 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Update golden files for failed tests


Bugs: HIVE-8043
https://issues.apache.org/jira/browse/HIVE-8043


Repository: hive-git


Description
---

Support merging files for spark.
For non-rc files, the merging task is simply a MapWork.
For RC/Orc files, the merging task is a MergeFileWork. And 
SparkMergeFileRecordHandler is added to handle it.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 5078a3a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java 
c54bffe 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
2537789 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
9b11fe4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
3eea26a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
94ebcdd 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java b0a9407 
  ql/src/test/queries/clientpositive/disable_merge_for_bucketing.q 471d296 
  ql/src/test/queries/clientpositive/merge1.q c7249af 
  ql/src/test/queries/clientpositive/merge2.q bb86dc2 
  ql/src/test/results/clientpositive/spark/merge1.q.out 772984d 
  ql/src/test/results/clientpositive/spark/merge2.q.out 8d8dcb8 
  ql/src/test/results/clientpositive/spark/union_remove_10.q.out f561fdf 
  ql/src/test/results/clientpositive/spark/union_remove_11.q.out 10b0c9c 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out a59a352 
  ql/src/test/results/clientpositive/spark/union_remove_4.q.out 518dc24 
  ql/src/test/results/clientpositive/spark/union_remove_5.q.out f7f9627 
  ql/src/test/results/clientpositive/spark/union_remove_9.q.out 0ec55de 

Diff: https://reviews.apache.org/r/25774/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 25774: Support merging small files

2014-09-19 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25774/
---

(Updated Sept. 20, 2014, 3:37 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Fix a bug in getting file system for the output dir


Bugs: HIVE-8043
https://issues.apache.org/jira/browse/HIVE-8043


Repository: hive-git


Description
---

Support merging files for spark.
For non-rc files, the merging task is simply a MapWork.
For RC/Orc files, the merging task is a MergeFileWork. And 
SparkMergeFileRecordHandler is added to handle it.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 5078a3a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java 
c54bffe 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
2537789 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
9b11fe4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
3eea26a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
94ebcdd 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java b0a9407 
  ql/src/test/queries/clientpositive/disable_merge_for_bucketing.q 471d296 
  ql/src/test/queries/clientpositive/merge1.q c7249af 
  ql/src/test/queries/clientpositive/merge2.q bb86dc2 
  ql/src/test/results/clientpositive/spark/merge1.q.out 772984d 
  ql/src/test/results/clientpositive/spark/merge2.q.out 8d8dcb8 
  ql/src/test/results/clientpositive/spark/union_remove_10.q.out f561fdf 
  ql/src/test/results/clientpositive/spark/union_remove_11.q.out 10b0c9c 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out a59a352 
  ql/src/test/results/clientpositive/spark/union_remove_4.q.out 518dc24 
  ql/src/test/results/clientpositive/spark/union_remove_5.q.out f7f9627 
  ql/src/test/results/clientpositive/spark/union_remove_9.q.out 0ec55de 

Diff: https://reviews.apache.org/r/25774/diff/


Testing
---


Thanks,

Rui Li



Review Request 27283: Compile time skew join optimization doesn't work with auto map join

2014-10-28 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27283/
---

Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8610
https://issues.apache.org/jira/browse/HIVE-8610


Repository: hive-git


Description
---

This patch adds QBJoinTree and colExprMap for the cloned join operator tree in 
SkewJoinOptimizer, so that CommonJoinResolver can properly convert the cloned 
join to map join.
The added tests are copied from skewjoinopt*.q, except that auto map join is 
enabled.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java 
e87c41b 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin9.q PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin9.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27283/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 27283: Compile time skew join optimization doesn't work with auto map join

2014-10-28 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27283/
---

(Updated Oct. 29, 2014, 2:59 a.m.)


Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8610
https://issues.apache.org/jira/browse/HIVE-8610


Repository: hive-git


Description
---

This patch adds QBJoinTree and colExprMap for the cloned join operator tree in 
SkewJoinOptimizer, so that CommonJoinResolver can properly convert the cloned 
join to map join.
The added tests are copied from skewjoinopt*.q, except that auto map join is 
enabled.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java 
e87c41b 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin9.q PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin9.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27283/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 27283: Compile time skew join optimization doesn't work with auto map join

2014-10-29 Thread Rui Li


> On Oct. 29, 2014, 5:09 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java, 
> > line 626
> > <https://reviews.apache.org/r/27283/diff/2/?file=739757#file739757line626>
> >
> > static method? Maybe we can create such clone() method in QBJoinTree 
> > class.

Yes I was thinking add such a method to QBJoinTree. Do you think we should make 
QBJoinTree clonable and implement deep copy?


> On Oct. 29, 2014, 5:09 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java, 
> > line 694
> > <https://reviews.apache.org/r/27283/diff/2/?file=739757#file739757line694>
> >
> > The method seems to be recursive, but I'm not sure it's correct. We 
> > call the same method for the parents and children of the current operator. 
> > When a parent or child operator of the current operator is processed, 
> > wouldn't the process go back to the current operator again? Recursion 
> > should be done in one direction and should have clear end condition.

Thanks a lot for the catch. The method can work the way it's being used. But 
you're right, I will make it more robust.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27283/#review59001
---


On Oct. 29, 2014, 2:59 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27283/
> ---
> 
> (Updated Oct. 29, 2014, 2:59 a.m.)
> 
> 
> Review request for hive, Szehon Ho and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8610
> https://issues.apache.org/jira/browse/HIVE-8610
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch adds QBJoinTree and colExprMap for the cloned join operator tree 
> in SkewJoinOptimizer, so that CommonJoinResolver can properly convert the 
> cloned join to map join.
> The added tests are copied from skewjoinopt*.q, except that auto map join is 
> enabled.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java 
> e87c41b 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin10.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin11.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin5.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin6.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin7.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin8.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin9.q PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin10.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin5.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin6.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin7.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin8.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin9.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27283/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rui Li
> 
>



Re: Review Request 27283: Compile time skew join optimization doesn't work with auto map join

2014-10-29 Thread Rui Li


> On Oct. 30, 2014, 2:29 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java, 
> > line 694
> > <https://reviews.apache.org/r/27283/diff/2/?file=739757#file739757line694>
> >
> > Maybe we should have a different operator clone. Check.
> > 
> > There is a method (though not called by Operator.clone()) in 
> > OperatorFactor, which takes care of colMap:
> > 
> >   public static  Operator getAndMakeChild(T 
> > conf,
> >   RowSchema rwsch, Map colExprMap, 
> > List> oplist)

Thanks Xuefu. I can try change Operatot.clone to call this method.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27283/#review59116
-------


On Oct. 29, 2014, 2:59 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27283/
> ---
> 
> (Updated Oct. 29, 2014, 2:59 a.m.)
> 
> 
> Review request for hive, Szehon Ho and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8610
> https://issues.apache.org/jira/browse/HIVE-8610
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch adds QBJoinTree and colExprMap for the cloned join operator tree 
> in SkewJoinOptimizer, so that CommonJoinResolver can properly convert the 
> cloned join to map join.
> The added tests are copied from skewjoinopt*.q, except that auto map join is 
> enabled.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java 
> e87c41b 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin10.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin11.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin5.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin6.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin7.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin8.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/skewjoin_mapjoin9.q PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin10.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin5.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin6.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin7.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin8.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/skewjoin_mapjoin9.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27283/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rui Li
> 
>



Re: Review Request 27283: Compile time skew join optimization doesn't work with auto map join

2014-10-30 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27283/
---

(Updated Oct. 30, 2014, 8:47 a.m.)


Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8610
https://issues.apache.org/jira/browse/HIVE-8610


Repository: hive-git


Description
---

This patch adds QBJoinTree and colExprMap for the cloned join operator tree in 
SkewJoinOptimizer, so that CommonJoinResolver can properly convert the cloned 
join to map join.
The added tests are copied from skewjoinopt*.q, except that auto map join is 
enabled.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 273691e 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java 
e87c41b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/JoinCond.java 96df830 
  ql/src/java/org/apache/hadoop/hive/ql/parse/QBJoinTree.java 94c563f 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin9.q PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin9.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27283/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 27283: Compile time skew join optimization doesn't work with auto map join

2014-10-30 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27283/
---

(Updated Oct. 30, 2014, 1:07 p.m.)


Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8610
https://issues.apache.org/jira/browse/HIVE-8610


Repository: hive-git


Description
---

This patch adds QBJoinTree and colExprMap for the cloned join operator tree in 
SkewJoinOptimizer, so that CommonJoinResolver can properly convert the cloned 
join to map join.
The added tests are copied from skewjoinopt*.q, except that auto map join is 
enabled.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 273691e 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java 
e87c41b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/JoinCond.java 96df830 
  ql/src/java/org/apache/hadoop/hive/ql/parse/QBJoinTree.java 94c563f 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/skewjoin_mapjoin9.q PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/skewjoin_mapjoin9.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27283/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 50787: Add a timezone-aware timestamp

2017-04-26 Thread Rui Li
/org/apache/hadoop/hive/serde2/io/TimestampTZWritable.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 23dbe6a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 73c72e1 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
 5601734 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
52f3527 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
56b4ca3 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryTimestampTZ.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
8237b64 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 ba44bae 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 9642a7e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/TimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableTimestampTZObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampTZWritable.java 
PRE-CREATION 
  service-rpc/if/TCLIService.thrift 824b049 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 8accf66 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp b6995c4 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 18a7825 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php ee5acd2 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py e9faa2a 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 714309c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 


Diff: https://reviews.apache.org/r/50787/diff/6/

Changes: https://reviews.apache.org/r/50787/diff/5-6/


Testing
---


Thanks,

Rui Li



Re: Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-05-01 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/#review173556
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
Lines 135 (patched)
<https://reviews.apache.org/r/58865/#comment246543>

The log is incorrect because cancelling the job doesn't mean killing the 
application.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
Lines 106 (patched)
<https://reviews.apache.org/r/58865/#comment246544>

I think the total task count needs only be computed once. It shouldn't 
change during the execution of the job, assuming we don't count failed/retried 
tasks.


- Rui Li


On May 1, 2017, 5:13 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58865/
> ---
> 
> (Updated May 1, 2017, 5:13 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16552
> https://issues.apache.org/jira/browse/HIVE-16552
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d3ea824 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
>  dd73f3e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 0b224f2 
> 
> 
> Diff: https://reviews.apache.org/r/58865/diff/2/
> 
> 
> Testing
> ---
> 
> Test locally
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-05-02 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/#review173689
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
Lines 132 (patched)
<https://reviews.apache.org/r/58865/#comment246728>

I think the log is unnecessary because the failure should already be logged 
in the monitor



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
Lines 135 (patched)
<https://reviews.apache.org/r/58865/#comment246729>

Same as above. Can we consolidate the logs a bit?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
Lines 104 (patched)
<https://reviews.apache.org/r/58865/#comment246731>

Maybe I was being misleading. I mean we can compute the total task only 
once when the job first reaches RUNNING state, i.e. in the "if (!running)". At 
this point, the total count is determined and won't change.


- Rui Li


On May 2, 2017, 6:49 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58865/
> ---
> 
> (Updated May 2, 2017, 6:49 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16552
> https://issues.apache.org/jira/browse/HIVE-16552
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84398c6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
>  dd73f3e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 0b224f2 
> 
> 
> Diff: https://reviews.apache.org/r/58865/diff/3/
> 
> 
> Testing
> ---
> 
> Test locally
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: [VOTE] Apache Hive 2.3.0 Release Candidate 0

2017-05-02 Thread Rui Li
The patch has been reverted in master and branch-2.3

On Wed, May 3, 2017 at 3:01 AM, Sergio Pena 
wrote:

> Hi Pengcheng,
>
> There is a request from the HDFS team to revert the patch committed on
> HIVE-16047 from
> our code because it might cause problems when future Hadoop versions are
> released due to being a
> private API on Hadoop. This API method signature has been changed between
> releases, and
> we don't want to have additional shims to support future Hadoop versions
> just for this method.
>
> I'd like to revert it from 2.3.0 release before doing the release. It is
> marked as being fixed on 2.2 but it is not cherry-picked on branch-2.2 but
> branch-2.3.
>
> Do you agree?
>
> - Sergio
>
> On Fri, Apr 28, 2017 at 1:40 PM, Pengcheng Xiong 
> wrote:
>
> > Withdraw the VOTE on candidate 0. Will propose candidate 1 soon. Thanks.
> >
> > On Thu, Apr 27, 2017 at 8:10 PM, Owen O'Malley 
> > wrote:
> >
> > > -1 you need a release of storage-API first.
> > >
> > > .. Owen
> > >
> > > > On Apr 27, 2017, at 17:43, Pengcheng Xiong 
> wrote:
> > > >
> > > > Apache Hive 2.3.0 Release Candidate 0 is available here:
> > > > http://home.apache.org/~pxiong/apache-hive-2.3.0-rc0/
> > > >
> > > >
> > > > Maven artifacts are available here:
> > > > https://repository.apache.org/content/repositories/
> orgapachehive-1073/
> > > >
> > > >
> > > > Source tag for RC0 is at:
> > > >
> > > > https://github.com/apache/hive/releases/tag/release-2.3.0-rc0
> > > >
> > > > Voting will conclude in 72 hours.
> > > >
> > > > Hive PMC Members: Please test and vote.
> > > >
> > > > Thanks.
> > >
> >
>



-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-02 Thread Rui Li
/apache/hadoop/hive/serde2/io/TimestampTZWritable.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 23dbe6a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 73c72e1 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
 5601734 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
52f3527 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
56b4ca3 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryTimestampTZ.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
8237b64 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 ba44bae 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 9642a7e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/TimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableTimestampTZObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampTZWritable.java 
PRE-CREATION 
  service-rpc/if/TCLIService.thrift 824b049 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 8accf66 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp b6995c4 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 18a7825 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php ee5acd2 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py e9faa2a 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 714309c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 


Diff: https://reviews.apache.org/r/50787/diff/7/

Changes: https://reviews.apache.org/r/50787/diff/6-7/


Testing
---


Thanks,

Rui Li



Re: Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-05-03 Thread Rui Li


> On May 3, 2017, 3:35 a.m., Rui Li wrote:
> >

Xuefu, the patch looks good to me overall. Thanks for the work. Do you think we 
should add some negative test case for it?


> On May 3, 2017, 3:35 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
> > Lines 132 (patched)
> > <https://reviews.apache.org/r/58865/diff/3/?file=1705971#file1705971line132>
> >
> > I think the log is unnecessary because the failure should already be 
> > logged in the monitor
> 
> Xuefu Zhang wrote:
> This is not new code.

Do you mean "LOG.info("Failed to submit Spark job " + sparkJobID);" is not new 
code? I don't find it in the current SparkTask.java.


> On May 3, 2017, 3:35 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
> > Lines 135 (patched)
> > <https://reviews.apache.org/r/58865/diff/3/?file=1705971#file1705971line135>
> >
> > Same as above. Can we consolidate the logs a bit?
> 
> Xuefu Zhang wrote:
> Jobmonitor prints it on console, while the log here is written to 
> hive.log.

The console.printInfo method does both printing and logging:

public void printInfo(String info, String detail, boolean isSilent) {
  if (!isSilent) {
        getInfoStream().println(info);
  }
  LOG.info(info + StringUtils.defaultString(detail));
}


> On May 3, 2017, 3:35 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
> > Lines 104 (patched)
> > <https://reviews.apache.org/r/58865/diff/3/?file=1705972#file1705972line104>
> >
> > Maybe I was being misleading. I mean we can compute the total task only 
> > once when the job first reaches RUNNING state, i.e. in the "if (!running)". 
> > At this point, the total count is determined and won't change.
> 
> Xuefu Zhang wrote:
> Yeah. However, I'd like to keep the state transition to running first 
> before breaking up and returning rc=4. In fact, if we lose the transition, 
> Hive actually goes into an instable state. What you said was what I tried in 
> first place.

I see. Thanks for the explanation.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/#review173689
---


On May 2, 2017, 6:49 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58865/
> ---
> 
> (Updated May 2, 2017, 6:49 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16552
> https://issues.apache.org/jira/browse/HIVE-16552
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84398c6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
>  dd73f3e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 0b224f2 
> 
> 
> Diff: https://reviews.apache.org/r/58865/diff/3/
> 
> 
> Testing
> ---
> 
> Test locally
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-03 Thread Rui Li


> On May 3, 2017, 9:57 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
> > Lines 132 (patched)
> > <https://reviews.apache.org/r/50787/diff/7/?file=1706787#file1706787line132>
> >
> > I think Identifier["timestamptz"] and Identifier["zone"] may be 
> > sufficient. It is not necessary to make them as key words and then add them 
> > back as identifiers. You can have a try and see if it works. Thanks..

Hi Pengcheng, sorry I'm quite ignorant about antlr. Could you please be more 
specific how to add the Identifiers? Let me explain what I intend to do. The 
new data type is named "timestamp with time zone", and "timestamptz" is added 
as a type alias. I thought it's required to add key words for type names. And 
according to the PostgreSQL doc we referenced 
(https://www.postgresql.org/docs/9.5/static/sql-keywords-appendix.html), "zone" 
is a non-reserved SQL key word and "timestamptz" is not a key word. So I added 
them in IdentifierParser.g as nonReserved.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review173833
---


On May 3, 2017, 6:34 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated May 3, 2017, 6:34 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 58b1c02 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   itests/hive-blobstore/src/test/queries/clientpositive/orc_format_part.q 
> 358eccd 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/orc_nonstd_partitions_loc.q
>  c462538 
>   itests/hive-blobstore/src/test/queries/clientpositive/rcfile_format_part.q 
> c563d3a 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/rcfile_nonstd_partitions_loc.q
>  d17c281 
>   itests/hive-blobstore/src/test/results/clientpositive/orc_format_part.q.out 
> 5d1319f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/orc_nonstd_partitions_loc.q.out
>  70e72f7 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_format_part.q.out
>  bed10ab 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_nonstd_partitions_loc.q.out
>  c6442f9 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java ade1900 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 8dc5f2e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f8b55da 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 01a652d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  38308c9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0cf9205 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 0721b92 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g d98a663 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8598fae 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 8f8eab0 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java bda2050 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 7cdf2c3 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 68d98f5 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java 
> 5a31e61 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/parse/Te

Re: Welcome new Hive committer, Zhihai Xu

2017-05-07 Thread Rui Li
Congrats Zhihai :)

On Sun, May 7, 2017 at 3:41 PM, Lefty Leverenz 
wrote:

> Congratulations Zhihai!
>
> -- Lefty
>
>
> On Sat, May 6, 2017 at 2:06 AM, Zoltan Haindrich <
> zhaindr...@hortonworks.com
> > wrote:
>
> > Congratulations Zhihai!
> >
> > On 6 May 2017 9:52 a.m., Mohammad Islam 
> > wrote:
> > Congrats Zhihai!!
> >
> > On Friday, May 5, 2017 9:52 AM, Xuefu Zhang 
> wrote:
> >
> >
> >  Hi all,
> >
> > I'm very please to announce that Hive PMC has recently voted to offer
> > Zhihai a committership which he accepted. Please join me in
> congratulating
> > on this recognition and thanking him for his contributions to Hive.
> >
> > Regards,
> > Xuefu
> >
> >
> >
> >
> >
>



-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-07 Thread Rui Li


> On May 7, 2017, 11:22 p.m., Xuefu Zhang wrote:
> > common/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java
> > Lines 138 (patched)
> > <https://reviews.apache.org/r/50787/diff/7/?file=1706764#file1706764line138>
> >
> > Not sure if I understand this, but why cannot we get seconds/nanos from 
> > date/timestamp and then convert to TimestapTZ? I assume this is a faster 
> > way.

Hi Xuefu, the reason why I did this:

1. As Ashutosh suggested, we will use LocalDate and LocalDateTime for Date and 
Timestamp in the future. When that happens, date/timestamp won't have 
seconds/nanos part, instead they're only descriptions of time. So the 
conversion should be done based on text format.
2. At the moment, the seconds/nanos of date/timestamp is computed using system 
timezone. So the conversion can have different results in different systems.

I noted Carter also suggested that SQL standard requires session zone should be 
taken into consideration in the conversion.
Consolidating your suggestions with Carter's, I think we can: make the 
conversion text-wise, and append the system zone (Hive currently doesn't have 
session zone). For example, a date of '2017-01-01' in LA will be converted to 
timestamptz as '2017-01-01 00:00:00 America/Los_Angeles', which in turn 
converted to '2017-01-01 08:00:00.0 Z'. Does this make sense?


> On May 7, 2017, 11:22 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
> > Lines 204 (patched)
> > <https://reviews.apache.org/r/50787/diff/7/?file=1706785#file1706785line204>
> >
> > What does this imply?

The method converts our primitive type to a SqlTypeName in calcite. But 
SqlTypeName currently doesn't have timestamp with time zone. This will have 
some impact when calcite does the optimization, e.g. computing average value 
sizes. But I think we have to live with it untile SqlTypeName supports 
timestamp with time zone.


> On May 7, 2017, 11:22 p.m., Xuefu Zhang wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java
> > Lines 32 (patched)
> > <https://reviews.apache.org/r/50787/diff/7/?file=1706827#file1706827line32>
> >
> > Can you also make a note about the source of the code, like 
> > TimeStampTZWritable?

sure, will do


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review174136
---


On May 8, 2017, 6:51 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated May 8, 2017, 6:51 a.m.)
> 
> 
> Review request for hive, pengcheng xiong and Xuefu Zhang.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 58b1c02 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   itests/hive-blobstore/src/test/queries/clientpositive/orc_format_part.q 
> 358eccd 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/orc_nonstd_partitions_loc.q
>  c462538 
>   itests/hive-blobstore/src/test/queries/clientpositive/rcfile_format_part.q 
> c563d3a 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/rcfile_nonstd_partitions_loc.q
>  d17c281 
>   itests/hive-blobstore/src/test/results/clientpositive/orc_format_part.q.out 
> 5d1319f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/orc_nonstd_partitions_loc.q.out
>  70e72f7 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_format_part.q.out
>  bed10ab 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_nonstd_partitions_loc.q.out
>  c6442f9 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java ade1900 
>   jdbc/src/java/org/apa

Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-08 Thread Rui Li
/BinarySortableSerDe.java
 89e15c3 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampTZWritable.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 23dbe6a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 73c72e1 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
 5601734 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
52f3527 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
56b4ca3 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryTimestampTZ.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
8237b64 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 ba44bae 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 9642a7e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/TimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableTimestampTZObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampTZWritable.java 
PRE-CREATION 
  service-rpc/if/TCLIService.thrift 824b049 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 8accf66 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp b6995c4 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 18a7825 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php ee5acd2 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py e9faa2a 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 714309c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 


Diff: https://reviews.apache.org/r/50787/diff/8/

Changes: https://reviews.apache.org/r/50787/diff/7-8/


Testing
---


Thanks,

Rui Li



Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-09 Thread Rui Li


> On May 9, 2017, 11:05 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
> > Lines 204 (patched)
> > <https://reviews.apache.org/r/50787/diff/8/?file=1710538#file1710538line204>
> >
> > Can you file a bug in Calcite that it should have sql type to represent 
> > TS w TZ?

Filed CALCITE-1784 for it


> On May 9, 2017, 11:05 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java
> > Lines 160 (patched)
> > <https://reviews.apache.org/r/50787/diff/8/?file=1710546#file1710546line160>
> >
> > Add a comment that string reperesentation will return TS in UTC zone 
> > and not in original TZ.

Here we convert timestamptz to string, which means TZ is already in UTC. I will 
add the comment when we convert string to timestamptz.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review174383
-------


On May 8, 2017, 3:17 p.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated May 8, 2017, 3:17 p.m.)
> 
> 
> Review request for hive, pengcheng xiong and Xuefu Zhang.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 58b1c02 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   itests/hive-blobstore/src/test/queries/clientpositive/orc_format_part.q 
> 358eccd 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/orc_nonstd_partitions_loc.q
>  c462538 
>   itests/hive-blobstore/src/test/queries/clientpositive/rcfile_format_part.q 
> c563d3a 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/rcfile_nonstd_partitions_loc.q
>  d17c281 
>   itests/hive-blobstore/src/test/results/clientpositive/orc_format_part.q.out 
> 5d1319f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/orc_nonstd_partitions_loc.q.out
>  70e72f7 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_format_part.q.out
>  bed10ab 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_nonstd_partitions_loc.q.out
>  c6442f9 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java ade1900 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 1b556ac 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f8b55da 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 01a652d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  38308c9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0cf9205 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 190b66b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ca639d3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 645ced9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> c3227c9 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java bda2050 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 7cdf2c3 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 68d98f5 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java 
> 5a31e61 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/parse/TestSQL11ReservedKeyWordsNegative.java
>  0dc6b19 
>   ql/src/test/queries/clientnegative/serde_regex.q c9cfc7d 
>   ql/src/test/queries/clientnegative/serde_regex2.q a

Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-09 Thread Rui Li
/BinarySortableSerDe.java
 89e15c3 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampTZWritable.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 23dbe6a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 73c72e1 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
 5601734 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
52f3527 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
56b4ca3 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryTimestampTZ.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
8237b64 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 ba44bae 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 9642a7e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/TimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableTimestampTZObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampTZWritable.java 
PRE-CREATION 
  service-rpc/if/TCLIService.thrift 824b049 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 8accf66 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp b6995c4 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 18a7825 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php ee5acd2 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py e9faa2a 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 714309c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 


Diff: https://reviews.apache.org/r/50787/diff/9/

Changes: https://reviews.apache.org/r/50787/diff/8-9/


Testing
---


Thanks,

Rui Li



Re: [Announce] New committer: Vineet Garg

2017-05-10 Thread Rui Li
Congrats :)

On Wed, May 10, 2017 at 2:06 PM, Zoltan Haindrich <
zhaindr...@hortonworks.com> wrote:

> Congratulations!
>
>
> On 10 May 2017 7:57 a.m., Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
> Congratulations Vineeth!!
>
> Thanks
> Prasanth
>
>
>
> On Tue, May 9, 2017 at 10:52 PM -0700, "Jesus Camacho Rodriguez" <
> jcama...@apache.org<mailto:jcama...@apache.org>> wrote:
>
>
> Congrats Vineet! Well deserved!
>
> --
> Jesús
>
>
>
>
>
> On 5/10/17, 6:45 AM, "Peter Vary"  wrote:
>
> >Congratulations Vineet! :)
> >
> >2017. máj. 9. 22:25 ezt írta ("Ashutosh Chauhan" ):
> >
> >> The Project Management Committee (PMC) for Apache Hive has invited
> Vineet
> >> Garg to become a committer and we are pleased to announce that he has
> >> accepted.
> >>
> >> Welcome, Vineet!
> >>
> >> Thanks,
> >> Ashutosh
> >>
>
>
>
>
>


-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: Welcome Rui Li to Hive PMC

2017-05-25 Thread Rui Li
Thank you guys :)

On Thu, May 25, 2017 at 3:29 PM, Peter Vary  wrote:

> Congratulations Rui!
>
> > On May 25, 2017, at 6:19 AM, Xuefu Zhang  wrote:
> >
> > Hi all,
> >
> > It's an honer to announce that Apache Hive PMC has recently voted to
> invite
> > Rui Li as a new Hive PMC member. Rui is a long time Hive contributor and
> > committer, and has made significant contribution in Hive especially in
> Hive
> > on Spark. Please join me in congratulating him and looking forward to a
> > bigger role that he will play in Apache Hive project.
> >
> > Thanks,
> > Xuefu
>
>


-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: Review Request 60632: HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60632/#review179554
---




ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java
Lines 68 (patched)
<https://reviews.apache.org/r/60632/#comment254315>

Please avoid * import



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java
Lines 432 (patched)
<https://reviews.apache.org/r/60632/#comment254316>

it's preferable to use HiveConf::getBoolVar



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java
Line 438 (original), 441 (patched)
<https://reviews.apache.org/r/60632/#comment254317>

nit: extra space before !useSparkGroupBy



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java
Line 471 (original), 477 (patched)
<https://reviews.apache.org/r/60632/#comment254319>

let's delete this comment


- Rui Li


On July 4, 2017, 8:48 a.m., Bing Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60632/
> ---
> 
> (Updated July 4, 2017, 8:48 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 19ff316 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RepartitionShuffler.java 
> d0c708c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 5f85f9e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> b9901da 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java afbeccb 
>   ql/src/test/queries/clientpositive/spark_explain_groupbyshuffle.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_explain_groupbyshuffle.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60632/diff/1/
> 
> 
> Testing
> ---
> 
> set hive.spark.use.groupby.shuffle=true;
> explain select key, count(val) from t1 group by key;
> 
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> 
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 2 <- Map 1 (GROUP, 2)
>   DagName: root_20170625202742_58335619-7107-4026-9911-43d2ec449088:2
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: t1
>   Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: key (type: int), val (type: string)
> outputColumnNames: key, val
> Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count(val)
>   keys: key (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> Reducer 2
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 10 Data size: 70 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>

Re: Review Request 60632: HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle

2017-07-04 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60632/#review179595
---


Ship it!




Ship It!

- Rui Li


On July 5, 2017, 4:07 a.m., Bing Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60632/
> ---
> 
> (Updated July 5, 2017, 4:07 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 19ff316 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RepartitionShuffler.java 
> d0c708c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 5f85f9e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> b9901da 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java afbeccb 
>   ql/src/test/queries/clientpositive/spark_explain_groupbyshuffle.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/spark_explain_groupbyshuffle.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60632/diff/2/
> 
> 
> Testing
> ---
> 
> set hive.spark.use.groupby.shuffle=true;
> explain select key, count(val) from t1 group by key;
> 
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> 
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 2 <- Map 1 (GROUP, 2)
>   DagName: root_20170625202742_58335619-7107-4026-9911-43d2ec449088:2
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: t1
>   Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: key (type: int), val (type: string)
> outputColumnNames: key, val
> Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count(val)
>   keys: key (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> Reducer 2
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 10 Data size: 70 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> 
> 
> set hive.spark.use.groupby.shuffle=false;
> explain select key, count(val) from t1 group by key;
> 
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> 
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 2 <- Map 1 (GROUP, 2)
>   DagName: root_20170625203122_3afe01dd-41cc-477e-9098-ddd58b37ad4e:3
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: t1
>   Statistics: Num rows: 20 Data size: 140 Basic stats: 
> COMPLETE Colum

Re: checkstyle changes

2017-12-07 Thread Rui Li
I also believe 140 is a little too long.

BTW do we use 2 or 4 chars for continuation indent? I personally prefer 4,
but I do find both cases in out code.

On Fri, Dec 8, 2017 at 6:20 AM, Alexander Kolbasov 
wrote:

> Problem with 140-wide code isn't just laptops - in many cases we need to do
> side-by-side diffs (e.g. for code reviews) and this doubles the required
> size.
>
> - Alex.
>
> On Thu, Dec 7, 2017 at 1:38 PM, Sergey Shelukhin 
> wrote:
>
> > I think the 140-character change will make the code hard to use on a
> > laptop without a monitor.
> >
> >
> > On 17/12/7, 02:43, "Peter Vary"  wrote:
> >
> > >Disclaimer: I did not have time to test it out, but according to
> > >http://checkstyle.sourceforge.net/config_misc.html#Indentation
> > ><http://checkstyle.sourceforge.net/config_misc.html#Indentation>
> > >Maybe the indentation could be solved by:
> > >lineWrappingIndentation=2 (default 4)
> > >forceStrictCondition=false (default false)
> > >
> > >http://checkstyle.sourceforge.net/config_misc.html#TrailingComment
> > ><http://checkstyle.sourceforge.net/config_misc.html#TrailingComment>
> > >might help with the comments
> > >
> > >Sorry for not being more helpful. Maybe sometime later I will have time
> > >to check these out.
> > >
> > >Thanks,
> > >Peter
> > >
> > >> On Dec 7, 2017, at 10:26 AM, Zoltan Haindrich
> > >> wrote:
> > >>
> > >> Hello Eugene!
> > >>
> > >> I've looked into doing something with these; but I was not able to
> > >>relieve the warnings you've mentioned:
> > >>
> > >> * the ;// is seems to be not configurable
> > >>   It seems like its handled by the whitespaceafter module; I'm not
> sure
> > >>how to allow / after ;
> > >> * I think that indentation of 4 for many method arguments makes it
> more
> > >>readable; so I think it would be the best to just drop this check...but
> > >>I've not seen any way to do this(w/o disabling the whole indentation
> > >>module...)
> > >>
> > >> maybe someone else should take a look at itI find it pretty hard
> to
> > >>get docs about specific chechkstyle configurations; since the search
> > >>keywords mostly contain keywords like: semicolon, whitespace,
> > >>comment...which tends to pull in all kind of garbage results :)
> > >>
> > >> cheers,
> > >> Zoltan
> > >>
> > >> On 6 Dec 2017 8:53 p.m., Eugene Koifman 
> > >>wrote:
> > >> It currently complains about no space between ; and // as in
> “…);//foo”
> > >>
> > >> And also about indentation when a single method call is split into
> > >>multiple lines.
> > >> It insists on 4 chars in this case, though we use 2 in (all?) other
> > >>cases.
> > >>
> > >> Could this be dialed down as well?
> > >>
> > >>
> > >> On 12/5/17, 7:26 AM, "Peter Vary"  wrote:
> > >>
> > >>+1 for the changes
> > >>
> > >>> On Dec 5, 2017, at 1:02 PM, Zoltan Haindrich  wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> I've filed a ticket to make the checkstyle warnings less noisy
> > >>>(https://issues.apache.org/jira/browse/HIVE-18222)
> > >>>
> > >>> * set maxlinelength to 140
> > >>>   I think everyone is working with big-enough displays to handle this
> > >>>:)
> > >>>   There are many methods which have complicated names / arguments /
> > >>>etc ; breaking the lines more frequently hurts readability...
> > >>> * disabled some restrictions like: declaration&hiding via get/set
> > >>>methods for protected/package fields are not mandatory
> > >>>
> > >>> If you don't feel comfortable with these changes, please share your
> > >>>point of view.
> > >>>
> > >>> cheers,
> > >>> Zoltan
> > >>>
> > >>>
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
>



-- 
Best regards!
Rui Li


Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-08 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71597
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java
<https://reviews.apache.org/r/30739/#comment117412>

If I understand correctly, this can be renamed to something like 
IN_MEMORY_NUM_ROWS?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java
<https://reviews.apache.org/r/30739/#comment117419>

Do we need a parameter here? Seems it can just use writeCursor?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java
<https://reviews.apache.org/r/30739/#comment117420>

Also close input and output here?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java
<https://reviews.apache.org/r/30739/#comment117425>

I suppose this is to avoid frequent switch buffer? But why the magic number 
1?


- Rui Li


On Feb. 7, 2015, 3:09 a.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> ---
> 
> (Updated Feb. 7, 2015, 3:09 a.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
> https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't 
> need, which is some overhead. We don't do lazy computing right away, instead 
> we wait a little till the cache is close to spill.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  78ab680 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 
> 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 7a09b4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java
>  e92e299 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 070ea4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 
> 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 
> 0df4598 
> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> ---
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-08 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71604
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java
<https://reviews.apache.org/r/30739/#comment117428>

What happens if input!=null and we're creating the temp file again?


- Rui Li


On Feb. 7, 2015, 3:09 a.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> ---
> 
> (Updated Feb. 7, 2015, 3:09 a.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
> https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't 
> need, which is some overhead. We don't do lazy computing right away, instead 
> we wait a little till the cache is close to spill.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  78ab680 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 
> 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 7a09b4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java
>  e92e299 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 070ea4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 
> 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 
> 0df4598 
> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> ---
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-08 Thread Rui Li


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> >

Some high level question, do we still need two buffers? And does it make sense 
to use something like a queue instead of an array as the buffer?


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71597
---


On Feb. 7, 2015, 3:09 a.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> ---
> 
> (Updated Feb. 7, 2015, 3:09 a.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
> https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't 
> need, which is some overhead. We don't do lazy computing right away, instead 
> we wait a little till the cache is close to spill.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  78ab680 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 
> 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 7a09b4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java
>  e92e299 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 070ea4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 
> 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 
> 0df4598 
> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> ---
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-09 Thread Rui Li


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> >
> 
> Rui Li wrote:
> Some high level question, do we still need two buffers? And does it make 
> sense to use something like a queue instead of an array as the buffer?
> 
> Jimmy Xiang wrote:
> Queue should work too. Using too buffers makes it easier to switch 
> between read and write. Switching itself is cheap here. For RowContainer, it 
> is expensive to switch because of first()/clear(), etc.

Thanks for the explanation Jimmy. I was just wondering if we can use a single 
queue as the buffer and avoid switching between two arrays and managing the 
cusors. That should make it less complicated right?


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, 
> > line 54
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line54>
> >
> > If I understand correctly, this can be renamed to something like 
> > IN_MEMORY_NUM_ROWS?
> 
> Jimmy Xiang wrote:
> Yes, you are right. Both are ok. Any strong reason for renaming it?

No, I just feel cache size is more like some size in bytes.


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, 
> > line 236
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line236>
> >
> > I suppose this is to avoid frequent switch buffer? But why the magic 
> > number 1?
> 
> Jimmy Xiang wrote:
> Right. If it is 1, there is no need to switch buffer. For other number, 
> we need to switch anyway. I assume there are many scenarios that there is 
> just one row.

I see thanks.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71597
---


On Feb. 9, 2015, 7:41 p.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> ---
> 
> (Updated Feb. 9, 2015, 7:41 p.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
> https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't 
> need, which is some overhead. We don't do lazy computing right away, instead 
> we wait a little till the cache is close to spill.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  78ab680 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 
> 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 7a09b4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java
>  e92e299 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 070ea4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 
> 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 
> 0df4598 
> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> ---
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-09 Thread Rui Li


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> >
> 
> Rui Li wrote:
> Some high level question, do we still need two buffers? And does it make 
> sense to use something like a queue instead of an array as the buffer?
> 
> Jimmy Xiang wrote:
> Queue should work too. Using too buffers makes it easier to switch 
> between read and write. Switching itself is cheap here. For RowContainer, it 
> is expensive to switch because of first()/clear(), etc.
> 
> Rui Li wrote:
> Thanks for the explanation Jimmy. I was just wondering if we can use a 
> single queue as the buffer and avoid switching between two arrays and 
> managing the cusors. That should make it less complicated right?
> 
> Jimmy Xiang wrote:
> You are right. As to using a single queue, we could do so if not for the 
> thread safety issue. Since we need to make it thread safe, with one queue, it 
> is hard to maintain the states in case some data are flushed to disk.

OK, thanks for the clarification.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71597
---


On Feb. 9, 2015, 7:41 p.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> -----------
> 
> (Updated Feb. 9, 2015, 7:41 p.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
> https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't 
> need, which is some overhead. We don't do lazy computing right away, instead 
> we wait a little till the cache is close to spill.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  78ab680 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 
> 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 7a09b4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java
>  e92e299 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 070ea4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 
> 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 
> 0df4598 
> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> ---
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



Re: [DISCUSS] Unsustainable situation with ptests

2018-05-14 Thread Rui Li
 > important tests which are consistent. We can apply some label that
> > pre-checkin-local tests are runs successful and only then we submit
> > for the
> > full-suite.
> >
> > More thoughts are welcome. Thanks for starting this conversation.
> >
> > On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez <
> > jcama...@apache.org<mailto:jcama...@apache.org>> wrote:
> >
> > I believe we have reached a state (maybe we did reach it a while ago)
> > that
> > is not sustainable anymore, as there are so many tests failing /
> > timing out
> > that it is not possible to verify whether a patch is breaking some
> > critical
> > parts of the system or not. It also seems to me that due to the
> > timeouts
> > (maybe due to infra, maybe not), ptest runs are taking even longer
> than
> > usual, which in turn creates even longer queue of patches.
> >
> > There is an ongoing effort to improve ptests usability (
> > https://issues.apache.org/jira/browse/HIVE-19425), but apart from
> > that,
> > we need to make an effort to stabilize existing tests and bring that
> > failure count to zero.
> >
> > Hence, I am suggesting *we stop committing any patch before we get a
> > green
> > run*. If someone thinks this proposal is too radical, please come up
> > with
> > an alternative, because I do not think it is OK to have the ptest
> runs
> > in
> > their current state. Other projects of certain size (e.g., Hadoop,
> > Spark)
> > are always green, we should be able to do the same.
> >
> > Finally, once we get to zero failures, I suggest we are less tolerant
> > with
> > committing without getting a clean ptests run. If there is a failure,
> > we
> > need to fix it or revert the patch that caused it, then we continue
> > developing.
> >
> > Please, let’s all work together as a community to fix this issue,
> that
> > is
> > the only way to get to zero quickly.
> >
> > Thanks,
> > Jesús
> >
> > PS. I assume the flaky tests will come into the discussion. Let´s see
> > first how many of those we have, then we can work to find a fix.
> >
> >
> >
> >
> >
> >
> >
> >
>



-- 
Best regards!
Rui Li


Re: [VOTE] Stricter commit guidelines

2018-05-15 Thread Rui Li
+1

On Tue, May 15, 2018 at 2:24 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> +1
>
>
>
> Thanks
> Prasanth
>
>
>
> On Mon, May 14, 2018 at 10:44 PM -0700, "Jesus Camacho Rodriguez" <
> jcama...@apache.org<mailto:jcama...@apache.org>> wrote:
>
>
> After work has been done to ignore most of the tests that were failing
> consistently/intermittently [1], I wanted to start this vote to gather
> support from the community to be stricter wrt committing patches to Hive.
> The committers guide [2] already specifies that a +1 should be obtained
> before committing, but there is another clause that allows committing under
> the presence of flaky tests (clause 4). Flaky tests are as good as having
> no tests, hence I propose to remove clause 4 and enforce the +1 from
> testing infra before committing.
>
>
>
> As I see it, by enforcing that we always get a +1 from the testing infra
> before committing, 1) we will have a more stable project, and 2) we will
> have another incentive as a community to create a more robust testing
> infra, e.g., replacing flaky tests for similar unit tests that are not
> flaky, trying to decrease running time for tests, etc.
>
>
>
> Please, share your thoughts about this.
>
>
>
> Here is my +1.
>
>
>
> Thanks,
>
> Jes?s
>
>
>
> [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
> mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
>
> [2] https://cwiki.apache.org/confluence/display/Hive/
> HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
>
>
>
>
>


-- 
Best regards!
Rui Li


Review Request 36033: HIVE-11108: HashTableSinkOperator doesn't support vectorization [Spark Branch]

2015-06-29 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36033/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-11108
https://issues.apache.org/jira/browse/HIVE-11108


Repository: hive-git


Description
---

This prevents any BaseWork containing HTS from being vectorized. It's basically 
specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks.
We should verify if it makes sense to make HTS support vectorization.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java c4554a7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
7c67fd2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkHashTableSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
c6a43d9 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenSparkSkewJoinProcessor.java
 7ebd18d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
e7b9c73 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 fd42959 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkHashTableSinkDesc.java 
ff32f5e 
  ql/src/test/results/clientpositive/spark/vector_decimal_mapjoin.q.out a80a20b 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out a0e6c2a 
  ql/src/test/results/clientpositive/spark/vector_mapjoin_reduce.q.out 8cf1a81 
  ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out b6c2b35 
  ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out 
a25d540 

Diff: https://reviews.apache.org/r/36033/diff/


Testing
---


Thanks,

Rui Li



Review Request 37128: Enable native vectorized map join for spark [Spark Branch]

2015-08-05 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37128/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-11180
https://issues.apache.org/jira/browse/HIVE-11180


Repository: hive-git


Description
---

The improvement was introduced in HIVE-9824. Let's use this task to track how 
we can enable that for spark.


Diffs
-

  itests/src/test/resources/testconfiguration.properties c710b0b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 e97a9f0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 10e3497 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
 87ebcf2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 f2080f4 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
82c3e50 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 39d1f18 
  ql/src/test/results/clientpositive/spark/vector_inner_join.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join0.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join3.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join4.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/37128/diff/


Testing
---

Verified that the newly added test golden files have same results as the Tez 
version. Since hive.vectorized.execution.mapjoin.native.enabled is on by 
default, other cases (map join involving Orc tables and vectorization enabled) 
should be covered automatically.

On the other hand, some tests that may involve the optimization haven't been 
enabled for spark, e.g. vector_char_mapjoin1.q. We can consider enable these 
tests in follow on task.


Thanks,

Rui Li



Re: Review Request 37128: Enable native vectorized map join for spark [Spark Branch]

2015-08-05 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37128/
---

(Updated Aug. 6, 2015, 2:09 a.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-11180
https://issues.apache.org/jira/browse/HIVE-11180


Repository: hive-git


Description
---

The improvement was introduced in HIVE-9824. Let's use this task to track how 
we can enable that for spark.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java f593d7d 
  itests/src/test/resources/testconfiguration.properties c710b0b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 e97a9f0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 10e3497 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
 87ebcf2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 f2080f4 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
82c3e50 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 39d1f18 
  ql/src/test/results/clientpositive/spark/vector_inner_join.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join0.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join3.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join4.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/37128/diff/


Testing
---

Verified that the newly added test golden files have same results as the Tez 
version. Since hive.vectorized.execution.mapjoin.native.enabled is on by 
default, other cases (map join involving Orc tables and vectorization enabled) 
should be covered automatically.

On the other hand, some tests that may involve the optimization haven't been 
enabled for spark, e.g. vector_char_mapjoin1.q. We can consider enable these 
tests in follow on task.


Thanks,

Rui Li



Re: Review Request 47185: HIVE-13716

2016-05-11 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47185/#review132610
---




ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 4161)
<https://reviews.apache.org/r/47185/#comment196874>

Here we choose non-recursive because location is an empty dir, right?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2679)
<https://reviews.apache.org/r/47185/#comment196875>

Are we sure destPath is always a file and not a dir?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2851)
<https://reviews.apache.org/r/47185/#comment196880>

I'm not sure if this is needed here. I set the session state in copyFiles 
because we need to call needToCopy in each thread, which requires the session 
state to be set.



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2854)
<https://reviews.apache.org/r/47185/#comment196881>

Is it possible that the rename fails but no exception is thrown? If so, we 
will eventaully return true, which is not correct.



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2871)
<https://reviews.apache.org/r/47185/#comment196882>

    I guess this is redundant?


- Rui Li


On May 10, 2016, 3:41 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47185/
> ---
> 
> (Updated May 10, 2016, 3:41 p.m.)
> 
> 
> Review request for hive and Rui Li.
> 
> 
> Bugs: HIVE-13716
> https://issues.apache.org/jira/browse/HIVE-13716
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Parallelize setting file permissions for moveFile()
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 71c9188 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java c4d3bfb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java c2c6c65 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java f4a9772 
>   shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java e931156 
> 
> Diff: https://reviews.apache.org/r/47185/diff/
> 
> 
> Testing
> ---
> 
> Existing regression tests.
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Re: Review Request 47185: HIVE-13716

2016-05-11 Thread Rui Li


> On May 11, 2016, 4:14 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2881
> > <https://reviews.apache.org/r/47185/diff/1/?file=1378112#file1378112line2881>
> >
> > I think this is useful to force shut current existing threads.

Oh I just thought it was shutdown. Makes sense.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47185/#review132668
---


On May 10, 2016, 3:41 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47185/
> ---
> 
> (Updated May 10, 2016, 3:41 p.m.)
> 
> 
> Review request for hive and Rui Li.
> 
> 
> Bugs: HIVE-13716
> https://issues.apache.org/jira/browse/HIVE-13716
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Parallelize setting file permissions for moveFile()
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 71c9188 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java c4d3bfb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java c2c6c65 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java f4a9772 
>   shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java e931156 
> 
> Diff: https://reviews.apache.org/r/47185/diff/
> 
> 
> Testing
> ---
> 
> Existing regression tests.
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Re: Review Request 47185: HIVE-13716

2016-05-11 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47185/#review132684
---


Ship it!




Ship It!

- Rui Li


On May 11, 2016, 4:29 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47185/
> ---
> 
> (Updated May 11, 2016, 4:29 p.m.)
> 
> 
> Review request for hive and Rui Li.
> 
> 
> Bugs: HIVE-13716
> https://issues.apache.org/jira/browse/HIVE-13716
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Parallelize setting file permissions for moveFile()
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 71c9188 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 0204fcd 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java bdda89a 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 981b961 
>   shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java e931156 
> 
> Diff: https://reviews.apache.org/r/47185/diff/
> 
> 
> Testing
> ---
> 
> Existing regression tests.
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Re: Review Request 47242: HIVE-13726

2016-05-11 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47242/#review132821
---




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 3074)
<https://reviews.apache.org/r/47242/#comment197074>

Any reason why remove the static? I think the newly added method can be 
static too.



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 3168)
<https://reviews.apache.org/r/47242/#comment197073>

I think FileNotFoundException is not needed here.


- Rui Li


On May 11, 2016, 4:36 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47242/
> ---
> 
> (Updated May 11, 2016, 4:36 p.m.)
> 
> 
> Review request for hive and Rui Li.
> 
> 
> Bugs: HIVE-13726
> https://issues.apache.org/jira/browse/HIVE-13726
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For insert overwrite significant amount might be spent in deleting existing 
> files. This patch parallelizes this task.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 71c9188 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java dd14124 
> 
> Diff: https://reviews.apache.org/r/47242/diff/
> 
> 
> Testing
> ---
> 
> existing regression tests. No change in functionality.
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Re: Review Request 47242: HIVE-13726

2016-05-11 Thread Rui Li


> On May 12, 2016, 5:48 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 3074
> > <https://reviews.apache.org/r/47242/diff/1/?file=1379752#file1379752line3074>
> >
> > It could have been. Feels cleaner to me as non-static. Just a matter of 
> > style. Is there any advantage of keeping it static?

Just thought the method is protected and may be accessed in sub-classes. If 
that's not a concern, I'm OK with non-static.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47242/#review132822
---


On May 11, 2016, 4:36 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47242/
> ---
> 
> (Updated May 11, 2016, 4:36 p.m.)
> 
> 
> Review request for hive and Rui Li.
> 
> 
> Bugs: HIVE-13726
> https://issues.apache.org/jira/browse/HIVE-13726
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For insert overwrite significant amount might be spent in deleting existing 
> files. This patch parallelizes this task.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 71c9188 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java dd14124 
> 
> Diff: https://reviews.apache.org/r/47242/diff/
> 
> 
> Testing
> ---
> 
> existing regression tests. No change in functionality.
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>



Review Request 50787: Add a timezone-aware timestamp

2016-08-04 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/
---

Review request for hive.


Bugs: HIVE-14412
https://issues.apache.org/jira/browse/HIVE-14412


Repository: hive-git


Description
---

The 1st patch to add timezone-aware timestamp.


Diffs
-

  common/src/test/org/apache/hadoop/hive/common/type/TestHiveTimestamp.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 
b90e576 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java 
7619efa 
  storage-api/src/java/org/apache/hadoop/hive/common/type/HiveTimestamp.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/50787/diff/


Testing
---


Thanks,

Rui Li



Re: YourKit open source license

2016-08-15 Thread Rui Li
If I remember correctly, I just contacted the sales of Yourkit and they
sent me the license by email. You'd better send your email using your
apache email account, in order to convince them you're a developer of Hive.

On Tue, Aug 16, 2016 at 2:51 AM, calvin hung 
wrote:

> Hi Rui and Alan,
>
> Could you or any nice guy share more detail steps of getting a Yourkit
> license for Hive?
> I've searched the full Hive dev mail archive but got no exact steps to get
> one.
> Thanks!
>
> Calvin
> From: "Li, Rui"
> Date: Tue, 31 Mar 2015 01:22:51 +
> To: "dev@hive.apache.org"
>
> - Contents -
>
> Thanks Alan! But I don’t see Hive in the sponsored open source project
> list. I’ll contact them anyway.
>
>
>
> Cheers,
>
> Rui Li
>
>
>
> From: Alan Gates [mailto:alanfga...@gmail.com]
> Sent: Tuesday, March 31, 2015 1:02 AM
> To: dev@hive.apache.org
> Subject: Re: YourKit open source license
>
>
>
> Seehttps://www.yourkit.com/customers/.
>
> Alan.
>
>
>
>
>
> Li, Rui
>
> March 30, 2015 at 0:54
>
> Hi guys,
>
> I want to use YourKit to profile hive performance. According to the wiki<
> https://cwiki.apache.org/confluence/display/Hive/Performance> hive has
> been granted open source license. Could anybody tell me how I can get the
> license? Thanks!
>
> Cheers,
> Rui Li




-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: YourKit open source license

2016-08-16 Thread Rui Li
Our wiki doesn't mention it's only for committers. Anyway I suggest you
contact YourKit sales to figure out.

On Tue, Aug 16, 2016 at 8:38 PM, calvin hung 
wrote:

>
>
> Thanks for your response, Rui.
>
> I don't have an apache email account.
>
> It looks like only committer can get an email account according to this
> page http://www.apache.org/dev/committers.html
>
> Does it mean that only Hive committers can get YourKit free licenses for
> Hive performance profiling?
>
>
>
>
>
>  On Tue, 16 Aug 2016 13:33:34 +0800 Rui Li <lirui.fu...@gmail.com
> >wrote 
>
>
>
>
> If I remember correctly, I just contacted the sales of Yourkit and they
>
> sent me the license by email. You'd better send your email using your
>
> apache email account, in order to convince them you're a developer of Hive.
>
>
>
> On Tue, Aug 16, 2016 at 2:51 AM, calvin hung <calvinh...@wasaitech.com&
> gt;
>
> wrote:
>
>
>
> > Hi Rui and Alan,
>
> >
>
> > Could you or any nice guy share more detail steps of getting a Yourkit
>
> > license for Hive?
>
> > I've searched the full Hive dev mail archive but got no exact steps
> to get
>
> > one.
>
> > Thanks!
>
> >
>
> > Calvin
>
> > From: "Li, Rui"<rui...@intel.com>
>
> > Date: Tue, 31 Mar 2015 01:22:51 +
>
> > To: "dev@hive.apache.org"<dev@hive.apache.org>
>
> >
>
> > - Contents -
>
> >
>
> > Thanks Alan! But I don’t see Hive in the sponsored open source project
>
> > list. I’ll contact them anyway.
>
> >
>
> >
>
> >
>
> > Cheers,
>
> >
>
> > Rui Li
>
> >
>
> >
>
> >
>
> > From: Alan Gates [mailto:alanfga...@gmail.com]
>
> > Sent: Tuesday, March 31, 2015 1:02 AM
>
> > To: dev@hive.apache.org
>
> > Subject: Re: YourKit open source license
>
> >
>
> >
>
> >
>
> > Seehttps://www.yourkit.com/customers/.
>
> >
>
> > Alan.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Li, Rui
>
> >
>
> > March 30, 2015 at 0:54
>
> >
>
> > Hi guys,
>
> >
>
> > I want to use YourKit to profile hive performance. According to the
> wiki<
>
> > https://cwiki.apache.org/confluence/display/Hive/Performance>;
> hive has
>
> > been granted open source license. Could anybody tell me how I can get
> the
>
> > license? Thanks!
>
> >
>
> > Cheers,
>
> > Rui Li
>
>
>
>
>
>
>
>
>
> --
>
> Best regards!
>
> Rui Li
>
> Cell: (+86) 13564950210
>
>
>
>
>
>
>


-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-11 Thread Rui Li
/hive/serde2/lazybinary/LazyBinaryUtils.java 
f8a110d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 1ac72c6 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/HiveTimestampObjectorInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaHiveTimestampObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampObjectInspector.java
 509189e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 51b529e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableHiveTimestampObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantHiveTimestampObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableHiveTimestampObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java 
3c483cc 
  service-rpc/if/TCLIService.thrift a4fa7b0 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h b249544 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 2f460e8 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 bce2a0c 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php 786c773 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py fdf6b1f 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 4b1854c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 
  storage-api/src/java/org/apache/hadoop/hive/common/type/HiveTimestamp.java 
PRE-CREATION 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java 
4a745e4 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/TimestampUtils.java 
41db9ca 

Diff: https://reviews.apache.org/r/50787/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-21 Thread Rui Li
/fast/BinarySortableDeserializeRead.java
 a7785b2 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampTZWritable.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 
bbccc7f 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritableBase.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 23dbe6a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 56945d1 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 73c72e1 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
 5601734 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
52f3527 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
54bfd2d 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryTimestampTZ.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
f8a110d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 1ac72c6 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampObjectInspector.java
 509189e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 51b529e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/TimestampTZObjectorInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableTimestampTZObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java 
3c483cc 
  service-rpc/if/TCLIService.thrift a4fa7b0 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h b249544 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 2f460e8 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 bce2a0c 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php 786c773 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py fdf6b1f 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 4b1854c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 
  storage-api/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
PRE-CREATION 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java 
4a745e4 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/TimestampUtils.java 
41db9ca 

Diff: https://reviews.apache.org/r/50787/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-21 Thread Rui Li
/fast/BinarySortableDeserializeRead.java
 a7785b2 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampTZWritable.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 
bbccc7f 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritableBase.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 23dbe6a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 56945d1 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 73c72e1 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
 5601734 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
52f3527 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
54bfd2d 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryTimestampTZ.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
f8a110d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 1ac72c6 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampObjectInspector.java
 509189e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 51b529e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/TimestampTZObjectorInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableTimestampTZObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java 
3c483cc 
  service-rpc/if/TCLIService.thrift a4fa7b0 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h b249544 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 2f460e8 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 bce2a0c 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php 786c773 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py fdf6b1f 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 4b1854c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 
  storage-api/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
PRE-CREATION 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java 
4a745e4 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/TimestampUtils.java 
41db9ca 

Diff: https://reviews.apache.org/r/50787/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-23 Thread Rui Li
/src/gen/thrift/gen-rb/serde_constants.rb 0ce9f27 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java 7ffc964 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
 5e119d7 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/fast/BinarySortableDeserializeRead.java
 a7785b2 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampTZWritable.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 
bbccc7f 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritableBase.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 23dbe6a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 56945d1 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 73c72e1 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
 5601734 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
52f3527 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
54bfd2d 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryTimestampTZ.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
f8a110d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java
 24b3d4e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 1ac72c6 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector.java
 70633f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampObjectInspector.java
 509189e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 e08ad43 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java
 2ed0843 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
 51b529e 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/SettableTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/TimestampTZObjectorInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableConstantTimestampTZObjectInspector.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/WritableTimestampTZObjectInspector.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/thrift/Type.java 0ad8c02 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java 
43c4819 
  serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java 
3c483cc 
  service-rpc/if/TCLIService.thrift a4fa7b0 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_constants.cpp 991cb2e 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h b249544 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 2f460e8 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TCLIServiceConstants.java
 930bed7 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
 bce2a0c 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TTypeId.java
 a3735eb 
  service-rpc/src/gen/thrift/gen-php/Types.php 786c773 
  service-rpc/src/gen/thrift/gen-py/TCLIService/constants.py c8d4f8f 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py fdf6b1f 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb 25adbb4 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 4b1854c 
  service/src/java/org/apache/hive/service/cli/ColumnValue.java 76e8c03 
  service/src/java/org/apache/hive/service/cli/TypeDescriptor.java d634bef 
  storage-api/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
PRE-CREATION 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java 
4a745e4 
  storage-api/src/java/org/apache/hadoop/hive/ql/util/TimestampUtils.java 
41db9ca 

Diff: https://reviews.apache.org/r/50787/diff/


Testing
---


Thanks,

Rui Li



Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-23 Thread Rui Li


> On Sept. 22, 2016, 11:20 a.m., Jason Dere wrote:
> > - How about compatbility with the various date functions 
> > (year()/month()/day()/etc)?

For most of the functions, TIMESTAMPTZ is implicitly converted to text. 
Therefore I think we can get correct results. I added some special handle in 
HOUR because some hour may be unavailable due to DST.
So far I've verified the following funcsions work:

to_date
year
quarter
month
day
dayofmonth
hour
minute
second
weekofyear

Is it OK we leave others in follow-on tasks? I'd like to keep the patch small.


> On Sept. 22, 2016, 11:20 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java,
> >  line 58
> > <https://reviews.apache.org/r/50787/diff/4/?file=1507773#file1507773line58>
> >
> > No conversions to/from DATE/TIMESTAMP?

Added conversion from date/timestamp to timestamptz. Default timezone is used 
for the converted timestamptz.
We can add convertion from numeric types in follow-on task.


> On Sept. 22, 2016, 11:20 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java,
> >  line 1109
> > <https://reviews.apache.org/r/50787/diff/4/?file=1507829#file1507829line1109>
> >
> > If the local timezone is different from the timezone in the 
> > TimestampTZ, is it possible that the year/month/day of the DATE might be 
> > different from the year/month/day of the TimestampTZ?

Good catch! It makes more sense to convert from the text representation than 
the time/nanos. So I convert the timestamptz to string first, and use that 
string to create the date. Same applies when converting to timestamp.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review149983
---


On Sept. 22, 2016, 4:05 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated Sept. 22, 2016, 4:05 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 0f9b036 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java 93f093f 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java de74c3e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f28d33e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 7be628e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  ba41518 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 8b0db4a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 7ceb005 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 62bbcc6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 9ba1865 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 82080eb 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java a718264 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 17b892c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java efae82d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 9cbc114 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 5808c90 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java a7551cb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java c961d14 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 570408a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 259fde8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTo

Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

2016-12-14 Thread Rui Li
Congratulations :)

On Thu, Dec 15, 2016 at 6:50 AM, Gunther Hagleitner <
ghagleit...@hortonworks.com> wrote:

> Congrats Rajesh!
> 
> From: Jimmy Xiang 
> Sent: Wednesday, December 14, 2016 11:38 AM
> To: u...@hive.apache.org
> Cc: dev@hive.apache.org; rbalamo...@apache.org
> Subject: Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan
>
> Congrats, Rajesh!!
>
> On Wed, Dec 14, 2016 at 11:32 AM, Sergey Shelukhin
>  wrote:
> > Congratulations!
> >
> > From: Chao Sun 
> > Reply-To: "u...@hive.apache.org" 
> > Date: Wednesday, December 14, 2016 at 10:52
> > To: "dev@hive.apache.org" 
> > Cc: "u...@hive.apache.org" , "
> rbalamo...@apache.org"
> > 
> > Subject: Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan
> >
> > Congrats Rajesh!
> >
> > On Wed, Dec 14, 2016 at 9:26 AM, Vihang Karajgaonkar <
> vih...@cloudera.com>
> > wrote:
> >>
> >> Congrats Rajesh!
> >>
> >> On Wed, Dec 14, 2016 at 1:54 AM, Jesus Camacho Rodriguez <
> >> jcamachorodrig...@hortonworks.com> wrote:
> >>
> >> > Congrats Rajesh, well deserved! :)
> >> >
> >> > --
> >> > Jesús
> >> >
> >> >
> >> >
> >> >
> >> > On 12/14/16, 8:41 AM, "Lefty Leverenz" 
> wrote:
> >> >
> >> > >Congratulations Rajesh!
> >> > >
> >> > >-- Lefty
> >> > >
> >> > >
> >> > >On Tue, Dec 13, 2016 at 11:58 PM, Rajesh Balamohan
> >> > >  >> > >
> >> > >wrote:
> >> > >
> >> > >> Thanks a lot for providing this opportunity and to all for their
> >> > messages.
> >> > >> :)
> >> > >>
> >> > >> ~Rajesh.B
> >> > >>
> >> > >> On Wed, Dec 14, 2016 at 11:33 AM, Dharmesh Kakadia
> >> > >>  >> > >
> >> > >> wrote:
> >> > >>
> >> > >> > Congrats Rajesh !
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Dharmesh
> >> > >> >
> >> > >> > On Tue, Dec 13, 2016 at 7:37 PM, Vikram Dixit K <
> >> > vikram.di...@gmail.com>
> >> > >> > wrote:
> >> > >> >
> >> > >> >> Congrats Rajesh! :)
> >> > >> >>
> >> > >> >> On Tue, Dec 13, 2016 at 9:36 PM, Pengcheng Xiong
> >> > >> >> 
> >> > >> >> wrote:
> >> > >> >>
> >> > >> >>> Congrats Rajesh! :)
> >> > >> >>>
> >> > >> >>> On Tue, Dec 13, 2016 at 6:51 PM, Prasanth Jayachandran <
> >> > >> >>> prasan...@apache.org
> >> > >> >>> > wrote:
> >> > >> >>>
> >> > >> >>> > The Apache Hive PMC has voted to make Rajesh Balamohan a
> >> > committer on
> >> > >> >>> the
> >> > >> >>> > Apache Hive Project. Please join me in congratulating Rajesh.
> >> > >> >>> >
> >> > >> >>> > Congratulations Rajesh!
> >> > >> >>> >
> >> > >> >>> > Thanks
> >> > >> >>> > Prasanth
> >> > >> >>>
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >> --
> >> > >> >> Nothing better than when appreciated for hard work.
> >> > >> >> -Mark
> >> > >> >>
> >> > >> >
> >> > >> >
> >> > >>
> >> >
> >
> >
>
>


-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: Invitation for Hive committers to become ORC committers

2016-12-15 Thread Rui Li
I'm interested. Thanks!

On Fri, Dec 16, 2016 at 1:18 PM, Chinna Rao Lalam <
lalamchinnara...@gmail.com> wrote:

> I would be interested. Thanks.
>
> Chinna Rao Lalam
>
> On Fri, Dec 16, 2016 at 6:43 AM, Owen O'Malley  wrote:
>
> > Ok, I've added the people who have responded so far and updated the ORC
> > website.
> >
> > http://orc.apache.org/news/2016/12/15/new-committers/
> > http://orc.apache.org/develop/
> >
> > Please make sure that I didn't typo your names.
> >
> > .. Owen
> >
> > On Thu, Dec 15, 2016 at 4:44 PM, Chaoyu Tang  wrote:
> >
> > > I am interested in. Thanks
> > >
> > > Chaoyu
> > >
> > > On Thu, Dec 15, 2016 at 5:13 PM, Rajesh Balamohan <
> rbalamo...@apache.org
> > >
> > > wrote:
> > >
> > > > I would be interested. Thanks.
> > > >
> > > > ~Rajesh.B
> > > >
> > > > On Fri, Dec 16, 2016 at 3:31 AM, Mithun Radhakrishnan <
> > > > mithun.radhakrish...@yahoo.com.invalid> wrote:
> > > >
> > > > > I'd be keen.
> > > > > Thanks,Mithun
> > > > > On Thursday, December 15, 2016, 1:37:36 PM PST, Wei Zheng <
> > > > > wzh...@hortonworks.com> wrote:I’m interested. Thanks.
> > > > >
> > > > > Thanks,
> > > > > Wei
> > > > >
> > > > > On 12/15/16, 13:21, "Vaibhav Gumashta" 
> > > > wrote:
> > > > >
> > > > > I¹d be interested.
> > > > >
> > > > > Thanks,
> > > > > ‹Vaibhav
> > > > >
> > > > > On 12/15/16, 1:12 PM, "Owen O'Malley" 
> > wrote:
> > > > >
> > > > > >All,
> > > > > >  As you are aware, we are in the last stages of removing the
> > > forked
> > > > > ORC
> > > > > >code out of Hive. The goal of moving ORC out of Hive was to
> > > increase
> > > > > its
> > > > > >community and we want to be very deliberately inclusive of the
> > > Hive
> > > > > >development community. Towards that end, the ORC PMC wants to
> > > > welcome
> > > > > >anyone who is already a Hive committer to become a committer
> on
> > > ORC.
> > > > > >
> > > > > >  Please respond on this thread to let us know if you are
> > > > interested.
> > > > > >
> > > > > >Thanks,
> > > > > >  Owen on behalf of the ORC PMC
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Hope It Helps,
> Chinna
>



-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: [ANNOUNCE] New committer: Zoltan Haindrich

2017-02-21 Thread Rui Li
Congratulations!

On Wed, Feb 22, 2017 at 8:47 AM, Sergey Shelukhin 
wrote:

> Congratulations!
>
> On 17/2/21, 16:43, "Prasanth Jayachandran" 
> wrote:
>
> >Congratulations Zoltan!!
> >
> >Thanks
> >Prasanth
> >
> >
> >
> >
> >On Tue, Feb 21, 2017 at 4:35 PM -0800, "Eugene Koifman"
> >mailto:ekoif...@hortonworks.com>> wrote:
> >
> >
> >Congratulations!
> >
> >On 2/21/17, 4:17 PM, "Vihang Karajgaonkar"  wrote:
> >
> >Congrats Zoltan!
> >
> >On Tue, Feb 21, 2017 at 4:16 PM, Vaibhav Gumashta  wrote:
> >
> >> Congrats Zoltan!
> >>
> >> On 2/21/17, 4:16 PM, "Jimmy Xiang"  wrote:
> >>
> >> >Congrats, Zoltan!!
> >> >
> >> >On Tue, Feb 21, 2017 at 4:15 PM, Sushanth Sowmyan
> >> >wrote:
> >> >> Congrats, Zoltan!
> >> >>
> >> >> Welcome aboard. :)
> >> >>
> >> >> On Feb 21, 2017 15:42, "Rajesh Balamohan"
> >> wrote:
> >> >>
> >> >>> Congrats Zoltan. :)
> >> >>>
> >> >>> ~Rajesh.B
> >> >>>
> >> >>> On Wed, Feb 22, 2017 at 4:43 AM, Wei Zheng
> >> >>>wrote:
> >> >>>
> >> >>> > Congrats Zoltan!
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Wei
> >> >>> >
> >> >>> > On 2/21/17, 13:09, "Alan Gates"  wrote:
> >> >>> >
> >> >>> > On behalf of the Hive PMC I am happy to announce Zoltan
> >> >>>Haindrich is
> >> >>> > our newest committer.  He has been contributing to Hive for
> >several
> >> >>> months
> >> >>> > across a number of areas, including the parser, HiveServer2,
> >and
> >> >>>cleaning
> >> >>> > up unit tests and documentation.  Please join me in welcoming
> >Zoltan
> >> >>>to
> >> >>> > Hive.
> >> >>> >
> >> >>> > Zoltan, feel free to say a few words introducing yourself
> >if you
> >> >>> would
> >> >>> > like to.
> >> >>> >
> >> >>> > Alan.
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>>
> >> >
> >>
> >>
> >
> >
> >
>
>


-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Re: Review Request 56687: Intern strings in various critical places to reduce memory consumption.

2017-02-23 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56687/#review166649
---




common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java (line 65)
<https://reviews.apache.org/r/56687/#comment238671>

do we need to check whether uri is null?



common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java (line 67)
<https://reviews.apache.org/r/56687/#comment238672>

why the stringField doesn't need the null check like other fields?



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java (line 3147)
<https://reviews.apache.org/r/56687/#comment238673>

How about intern the path in the createEmptyFile method?



ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java (line 183)
<https://reviews.apache.org/r/56687/#comment238676>

can we call the util method?



ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java (line 188)
<https://reviews.apache.org/r/56687/#comment238677>

guess we can also add a util method for this



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 253)
<https://reviews.apache.org/r/56687/#comment238681>

since we'll intern strings in the new path, do we have to intern taskTmpDir 
here?



ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 
(line 322)
<https://reviews.apache.org/r/56687/#comment238682>

will this cause the hash map to resize since the default load factor is 
0.75? and several similar concerns below


- Rui Li


On Feb. 23, 2017, 9:01 p.m., Misha Dmitriev wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56687/
> ---
> 
> (Updated Feb. 23, 2017, 9:01 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/HIVE-15882
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See the description of the problem in 
> https://issues.apache.org/jira/browse/HIVE-15882 Interning strings per this 
> review removes most of the overhead due to duplicate strings.
> 
> Also, where maps in several places are created from other maps, use the 
> original map's size for the new map. This is to avoid the situation when a 
> map with default capacity (typically 16) is created to hold just 2-3 entries, 
> and the rest of the internal 16-entry array is wasted.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> e81cbce3e333d44a4088c10491f399e92a505293 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java 
> 08420664d59f28f75872c25c9f8ee42577b23451 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> e91064b9c75e8adb2b36f21ff19ec0c1539b03b9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 
> 51530ac16c92cc75d501bfcb573557754ba0c964 
>   ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java 
> 55b3b551a1dac92583b6e03b10beb8172ca93d45 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
> 82dc89803be9cf9e0018720eeceb90ff450bfdc8 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 
> c0edde9e92314d86482b5c46178987e79fae57fe 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
> c6ae6f290857cfd10f1023058ede99bf4a10f057 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 24d16812515bdfa90b4be7a295c0388fcdfe95ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
>  ede4fcbe342052ad86dadebcc49da2c0f515ea98 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java
>  0882ae2c6205b1636cbc92e76ef66bb70faadc76 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 
> 68b0ad9ea63f051f16fec3652d8525f7ab07eb3f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 
> d4bdd96eaf8d179bed43b8a8c3be0d338940154a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MsckDesc.java 
> b7a7e4b7a5f8941b080c7805d224d3885885f444 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 
> 73981e826870139a42ad881103fdb0a2ef8433a2 
> 
> Diff: https://reviews.apache.org/r/56687/diff/
> 
> 
> Testing
> ---
> 
> I've measured how much memory this change plus another one (interning 
> Properties in PartitionDesc) save in my HS2 benchmark - the result is 37%. 
> See the details in HIVE-15882.
> 
> 
> Thanks,
> 
> Misha Dmitriev
> 
>



Re: Review Request 56687: Intern strings in various critical places to reduce memory consumption.

2017-02-27 Thread Rui Li


> On Feb. 24, 2017, 7:38 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java,
> >  line 322
> > <https://reviews.apache.org/r/56687/diff/2/?file=1643011#file1643011line322>
> >
> > will this cause the hash map to resize since the default load factor is 
> > 0.75? and several similar concerns below
> 
> Misha Dmitriev wrote:
> You are probably right, in that this constructor's parameter is the 
> initial capacity of this table (more or less the size of the internal array) 
> - not how many elements the table is expected to hold. However, if you check 
> the code of HashMap, the things are more interesting. The actual capacity of 
> the table is always a power of two, so unless this parameter is also a power 
> of two, the capacity will be chosen as the nearest higher power of two, i.e. 
> it will be higher than the parameter and closer to what we actually need. 
> Also, if we create a table with the default size (16) here and then will put 
> many more elements into it, it will be resized several times, whereas with 
> the current code it will be resized at most once. Trying to "factor in" the 
> load factor will likely add more confusion/complexity. All in all, given that 
> choosing capacity in HashMap internally is non-trivial, I think it's 
> easier/safer to just call 'new HashMap(oldMap.size())' as we do now.

Then could you explain why we need to change the current code? The JavaDoc of 
LinkedHashMap(Map m) indicates it will create an 
instance "with a default load factor (0.75) and an initial capacity sufficient 
to hold the mappings in the specified map". Looking at the code, it computes 
the initial cap like "m.size()/loadFactor + 1", rounds it to next power of two, 
and it avoids re-hashing. Won't that be good enough for us?


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56687/#review166649
---


On Feb. 24, 2017, 9:27 p.m., Misha Dmitriev wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56687/
> ---
> 
> (Updated Feb. 24, 2017, 9:27 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/HIVE-15882
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See the description of the problem in 
> https://issues.apache.org/jira/browse/HIVE-15882 Interning strings per this 
> review removes most of the overhead due to duplicate strings.
> 
> Also, where maps in several places are created from other maps, use the 
> original map's size for the new map. This is to avoid the situation when a 
> map with default capacity (typically 16) is created to hold just 2-3 entries, 
> and the rest of the internal 16-entry array is wasted.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> e81cbce3e333d44a4088c10491f399e92a505293 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java 
> 08420664d59f28f75872c25c9f8ee42577b23451 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> e91064b9c75e8adb2b36f21ff19ec0c1539b03b9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 
> 51530ac16c92cc75d501bfcb573557754ba0c964 
>   ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java 
> 55b3b551a1dac92583b6e03b10beb8172ca93d45 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
> 82dc89803be9cf9e0018720eeceb90ff450bfdc8 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 
> c0edde9e92314d86482b5c46178987e79fae57fe 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
> c6ae6f290857cfd10f1023058ede99bf4a10f057 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 24d16812515bdfa90b4be7a295c0388fcdfe95ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
>  ede4fcbe342052ad86dadebcc49da2c0f515ea98 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java
>  0882ae2c6205b1636cbc92e76ef66bb70faadc76 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalRes

Re: Review Request 56687: Intern strings in various critical places to reduce memory consumption.

2017-02-27 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56687/#review166991
---




ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java (line 3178)
<https://reviews.apache.org/r/56687/#comment239095>

do we still need this? I think createEmptyFile will intern the strings for 
us?



ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java (line 173)
<https://reviews.apache.org/r/56687/#comment239097>

instead of creating a new map, can we use the pathToAliases map and intern 
the paths in-place?


- Rui Li


On Feb. 27, 2017, 7:42 p.m., Misha Dmitriev wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56687/
> ---
> 
> (Updated Feb. 27, 2017, 7:42 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/HIVE-15882
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See the description of the problem in 
> https://issues.apache.org/jira/browse/HIVE-15882 Interning strings per this 
> review removes most of the overhead due to duplicate strings.
> 
> Also, where maps in several places are created from other maps, use the 
> original map's size for the new map. This is to avoid the situation when a 
> map with default capacity (typically 16) is created to hold just 2-3 entries, 
> and the rest of the internal 16-entry array is wasted.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> e81cbce3e333d44a4088c10491f399e92a505293 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java 
> 08420664d59f28f75872c25c9f8ee42577b23451 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> e91064b9c75e8adb2b36f21ff19ec0c1539b03b9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 
> 51530ac16c92cc75d501bfcb573557754ba0c964 
>   ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java 
> 55b3b551a1dac92583b6e03b10beb8172ca93d45 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
> 82dc89803be9cf9e0018720eeceb90ff450bfdc8 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 
> c0edde9e92314d86482b5c46178987e79fae57fe 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
> c6ae6f290857cfd10f1023058ede99bf4a10f057 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 24d16812515bdfa90b4be7a295c0388fcdfe95ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
>  ede4fcbe342052ad86dadebcc49da2c0f515ea98 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java
>  0882ae2c6205b1636cbc92e76ef66bb70faadc76 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 
> 68b0ad9ea63f051f16fec3652d8525f7ab07eb3f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 
> d4bdd96eaf8d179bed43b8a8c3be0d338940154a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MsckDesc.java 
> b7a7e4b7a5f8941b080c7805d224d3885885f444 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 
> 73981e826870139a42ad881103fdb0a2ef8433a2 
> 
> Diff: https://reviews.apache.org/r/56687/diff/
> 
> 
> Testing
> ---
> 
> I've measured how much memory this change plus another one (interning 
> Properties in PartitionDesc) save in my HS2 benchmark - the result is 37%. 
> See the details in HIVE-15882.
> 
> 
> Thanks,
> 
> Misha Dmitriev
> 
>



Re: Review Request 57586: HIVE-16183: Fix potential thread safety issues with static variables

2017-03-14 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57586/#review168965
---




metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java
Line 58 (original), 56 (patched)
<https://reviews.apache.org/r/57586/#comment241305>

shall we remove the hiveConf parameter as it's not needed?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
Line 502 (original), 480 (patched)
<https://reviews.apache.org/r/57586/#comment241306>

do we still need this method?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
Line 545 (original), 520 (patched)
<https://reviews.apache.org/r/57586/#comment241307>

    same as above


- Rui Li


On March 14, 2017, 4:32 a.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57586/
> ---
> 
> (Updated March 14, 2017, 4:32 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16183
> https://issues.apache.org/jira/browse/HIVE-16183
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Please see JIRA description
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 7e6846d 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java 181f0d2 
>   cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java f1806a0 
>   cli/src/test/org/apache/hadoop/hive/cli/TestRCFileCat.java 11ceb31 
>   common/src/java/org/apache/hadoop/hive/common/LogUtils.java c2a0d9a 
>   common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 926b4a6 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
> 9c30ee7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ArchiveUtils.java 6381a21 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 4ac25c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6693134 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> 5b0c2bf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java
>  6383e8a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTable.java
>  9030e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java 6582cdd 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java a1408e9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 7727114 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 4995bdf 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d391164 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 369584b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java 
> 90b1dff 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 044d64c 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 0e67ea6 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/listbucketingpruner/ListBucketingPrunerUtils.java
>  4d3e74e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
>  93202c3 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 50eda15 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/VectorizerReason.java
>  e0a6198 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 36009bf 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> f175663 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java 01b5559 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractVectorDesc.java e85a418 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 0b49294 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java ca69697 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 9ae30ab 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorAppMasterEventDesc.java 
> 2e11321 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFileSinkDesc.java 325ac91 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFilterDesc.java 6feed84 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorGroupByDesc.java f8554e2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorLimitDesc.java c9bc45a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinDesc.java 3aa65d3 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinInfo.java 9429785 
>   ql/src/ja

Re: Review Request 57586: HIVE-16183: Fix potential thread safety issues with static variables

2017-03-15 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57586/#review169094
---


Ship it!




Ship It!

- Rui Li


On March 15, 2017, 10:03 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57586/
> ---
> 
> (Updated March 15, 2017, 10:03 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16183
> https://issues.apache.org/jira/browse/HIVE-16183
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Please see JIRA description
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 7e6846d 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java 181f0d2 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java 2c088c9 
>   cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java f1806a0 
>   cli/src/test/org/apache/hadoop/hive/cli/TestRCFileCat.java 11ceb31 
>   common/src/java/org/apache/hadoop/hive/common/LogUtils.java c2a0d9a 
>   common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 926b4a6 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
> 9c30ee7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ArchiveUtils.java 6381a21 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 4ac25c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6693134 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> 5b0c2bf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java
>  6383e8a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTable.java
>  9030e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java 6582cdd 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java a1408e9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 7727114 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 4995bdf 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d391164 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 59682db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java 
> 90b1dff 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 044d64c 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 0e67ea6 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/listbucketingpruner/ListBucketingPrunerUtils.java
>  4d3e74e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
>  93202c3 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 50eda15 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/VectorizerReason.java
>  e0a6198 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> f762fee 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> f175663 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java 01b5559 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractVectorDesc.java e85a418 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 0b49294 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java ca69697 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 9ae30ab 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorAppMasterEventDesc.java 
> 2e11321 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFileSinkDesc.java 325ac91 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFilterDesc.java 6feed84 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorGroupByDesc.java f8554e2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorLimitDesc.java c9bc45a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinDesc.java 3aa65d3 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinInfo.java 9429785 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorPartitionDesc.java 4078c7d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorReduceSinkDesc.java 
> 2eb44b8 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorReduceSinkInfo.java 
> 8c35415 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorSMBJoinDesc.java 031f11e 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorSelectDesc.java c2c9450 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/VectorSparkHashTableSinkDesc.java 
> 7fb59db 
>   
> ql/src/java/org

Re: [ANNOUNCE] New PMC Member : Eugene Koifman

2017-03-16 Thread Rui Li
Congratulations :)

On Thu, Mar 16, 2017 at 4:26 PM, Lefty Leverenz 
wrote:

> More congratulations!
>
> -- Lefty
>
> On Wed, Mar 15, 2017 at 1:27 PM, Eugene Koifman 
> wrote:
>
> > Thank you everyone!
> >
> > On 3/15/17, 12:21 PM, "Gunther Hagleitner" 
> > wrote:
> >
> > Congratulations!
> > 
> > From: Sergey Shelukhin 
> > Sent: Wednesday, March 15, 2017 11:18 AM
> > To: dev@hive.apache.org
> > Subject: Re: [ANNOUNCE] New PMC Member : Eugene Koifman
> >
> > Congrats!
> >
> > On 17/3/15, 01:02, "Zoltan Haindrich" 
> > wrote:
> >
> > >Congrats Eugene!!
> > >
> > >On 15 Mar 2017 07:50, Peter Vary  wrote:
> > >Congratulations! :)
> > >
> > >2017. márc. 15. 7:05 ezt írta ("Vaibhav Gumashta"
> > > > >>):
> > >
> > >> Congrats Eugene!
> > >>
> > >>
> > >> On 3/14/17, 11:03 PM, "Rajesh Balamohan" 
> > wrote:
> > >>
> > >> >Congrats Eugene!! :)
> > >> >
> > >> >~Rajesh.B
> > >> >
> > >> >On Wed, Mar 15, 2017 at 11:21 AM, Pengcheng Xiong <
> > pxi...@apache.org>
> > >> >wrote:
> > >> >
> > >> >> Congrats! Well deserved!
> > >> >>
> > >> >> Thanks.
> > >> >> Pengcheng
> > >> >>
> > >> >> On Tue, Mar 14, 2017 at 10:39 PM, Ashutosh Chauhan
> >     >> >>
> > >> >> wrote:
> > >> >>
> > >> >> > On behalf of the Hive PMC I am delighted to announce Eugene
> > >>Koifman is
> > >> >> > joining Hive PMC.
> > >> >> > Eugene is a long time contributor in Hive and is focusing on
> > ACID
> > >> >>support
> > >> >> > areas these days.
> > >> >> >
> > >> >> > Welcome, Eugene!
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Ashutosh
> > >> >> >
> > >> >>
> > >>
> > >>
> > >
> >
> >
> >
> >
> >
>



-- 
Best regards!
Rui Li
Cell: (+86) 13564950210


Is Github PR mandatory?

2019-03-14 Thread Rui Li
Hi,

I believe we still need to upload patch to JIRA for precommit testing. So
just want to make sure whether opening a github PR is mandatory? Or is it
just a substitution for the review board?

-- 
Best regards!
Rui Li


Re: Is Github PR mandatory?

2019-03-17 Thread Rui Li
Got it. Thanks!

On Fri, Mar 15, 2019 at 1:47 PM Mani M  wrote:

> It's used as substitution for review board.
>
>
> With Regards
> M.Mani
> +61 432 461 087
>
> On Fri, 15 Mar 2019, 13:59 Rui Li,  wrote:
>
> > Hi,
> >
> > I believe we still need to upload patch to JIRA for precommit testing. So
> > just want to make sure whether opening a github PR is mandatory? Or is it
> > just a substitution for the review board?
> >
> > --
> > Best regards!
> > Rui Li
> >
>


-- 
Best regards!
Rui Li


Hit HIVE-13023 with 2.0.1 maven artifacts

2019-09-04 Thread Rui Li
Hello guys,

I hit HIVE-13023 <https://issues.apache.org/jira/browse/HIVE-13023> when I
programmatically executed some queries with Hive-2.0.1. I did some
investigation and there seemed to be some issues with the 2.0.1 artifacts
we published.
I compared the hive-exec artifact from maven central
<https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.0.1/hive-exec-2.0.1.jar>
with
the jar in our binary distribution
<https://archive.apache.org/dist/hive/hive-2.0.1/apache-hive-2.0.1-bin.tar.gz>,
and the two hive-exec jars are of different sizes.
I also decompiled these two jars to check the offending method
*StorageFormat::fillStorageFormat*. It turned out the jar from maven got
the ordinals of some tokens wrong (the ordinal of TOK_FILEFORMAT_GENERIC
should be 715):
[image: 屏幕快照 2019-09-04 下午5.08.59.png]

And the jar in our distribution has the correct ordinals:
[image: 屏幕快照 2019-09-04 下午5.14.00.png]

I wonder whether anybody could help verify the issue, and whether it's
possible to update the published jars if the issue is valid?

-- 
Best regards!
Rui Li


Re: 【Hive Alter Table Add column at specified position】

2020-07-19 Thread Rui Li
Yeah, according to our DDL doc, we don't support this use case at the
moment. Perhaps you can use REPLACE COLUMNS as a workaround.

On Sat, Jun 27, 2020 at 5:32 PM 忝忝向仧 <153488...@qq.com> wrote:

> Hi,all:
>
>
> It seems that Hive can not alter table to add column at specified
> position.
> For instance,the Table A has c1,c2,c3 columns,and i want to add column c4
> after c1,therefore,the table would be like c1,c4,c2,c3 instead of
> c1,c2,c3,c4.
>
>
> Thanks.



-- 
Best regards!
Rui Li


[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-11 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131126#comment-14131126
 ] 

Rui Li commented on HIVE-8017:
--

[~xuefuz] OK, I'll do that.
BTW, do you think we need a JIRA to track this difference so we can find the 
cause when we have time?

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
> Branch]
> ---
>
> Key: HIVE-8017
> URL: https://issues.apache.org/jira/browse/HIVE-8017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
> HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for 
> partitioning. While BytesWritable serves partitioning well for simple cases, 
> we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
> bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-11 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-8043:


Assignee: Rui Li

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-11 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017.5-spark.patch

Update the golden file for union_remove_25

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
> Branch]
> ---
>
> Key: HIVE-8017
> URL: https://issues.apache.org/jira/browse/HIVE-8017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
> HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for 
> partitioning. While BytesWritable serves partitioning well for simple cases, 
> we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
> bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8098) The spark golden file for union_remove_25 is different from MR version [Spark Branch]

2014-09-14 Thread Rui Li (JIRA)
Rui Li created HIVE-8098:


 Summary: The spark golden file for union_remove_25 is different 
from MR version [Spark Branch]
 Key: HIVE-8098
 URL: https://issues.apache.org/jira/browse/HIVE-8098
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Rui Li
Priority: Minor


After applying HIVE-8017, there's difference in the golden file for 
{{union_remove_25.q}} between the spark and MR version.
Although the difference is only in the total size of {{outputTbl2}} (6812, MR 
-> 6826, spark), we may want to find the cause of the diff and verify if it's 
an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133490#comment-14133490
 ] 

Rui Li commented on HIVE-8017:
--

Thanks [~xuefuz], I created HIVE-8098 for this.

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
> Branch]
> ---
>
> Key: HIVE-8017
> URL: https://issues.apache.org/jira/browse/HIVE-8017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>  Labels: Spark-M1
> Fix For: spark-branch
>
> Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, 
> HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for 
> partitioning. While BytesWritable serves partitioning well for simple cases, 
> we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
> bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-7956) When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]

2014-09-14 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li resolved HIVE-7956.
--
Resolution: Fixed

Fixed via HIVE-8017

> When inserting into a bucketed table, all data goes to a single bucket [Spark 
> Branch]
> -
>
> Key: HIVE-7956
> URL: https://issues.apache.org/jira/browse/HIVE-7956
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>
> I created a bucketed table:
> {code}
> create table testBucket(x int,y string) clustered by(x) into 10 buckets;
> {code}
> Then I run a query like:
> {code}
> set hive.enforce.bucketing = true;
> insert overwrite table testBucket select intCol,stringCol from src;
> {code}
> Here {{src}} is a simple textfile-based table containing 4000 records 
> (not bucketed). The query launches 10 reduce tasks but all the data goes to 
> only one of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136618#comment-14136618
 ] 

Rui Li commented on HIVE-8043:
--

Thank you [~xuefuz]. I'll take a look.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-17 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137365#comment-14137365
 ] 

Rui Li commented on HIVE-8043:
--

Hi [~xuefuz],

I looked into the patch in HIVE-7704. My understanding is that the newly added 
operator, mapper etc. is just for (fast) merging RC and Orc files. Other file 
formats will still be merged by the {{TS -> FS}} work. For RC and Orc files, 
this work is a {{MergeFileWork}}, for others, this work is a {{MapWork}}. And 
according to the execution engine, this work will be wrapped in a MapredWork, 
TezWork or SparkWork.

For RC and Orc files, {{MergeFileMapper}} is used instead of {{ExecMapper}}. 
The main difference between the two mappers is that {{MergeFileMapper}} wraps 
and uses {{AbstractFileMergeOperator}} (two implementations for RC and Orc file 
respectively) as the top operator, while {{ExecMapper}} uses {{MapOperator}}.

I think the following needs to be considered on spark side:
* For non-RC files, I think it should work out of the box, at least for simple 
cases. We may need to take extra care of dynamically partitioned tables, 
multi-insert and union queries etc. I tested some simple insert queries where I 
increased {{mapreduce.job.reduces}} to generate many small files. With 
{{hive.merge.sparkfiles=false}}, the destination table consists of all these 
small files, and when turned on, all the small files get merged. I noticed the 
merging feature caused some issue in HIVE-7810. I'll verify if it's still a 
problem now that we have union-remove disabled for spark.
* For RC and Orc files, we need to be aware of the {{MergeFileWork}}. And since 
{{SparkMapRecordHandler}} is our counterpart for {{ExecMapper}}, we'll need 
another record handler as counterpart for {{MergeFileMapper}}, maybe another 
hive function as well. I'm working to implement this to do some tests.
* MR distinguishes map-only and map-reduce jobs for merging. Not sure if we 
shall do similar thing for spark
* Besides, it seems there're two scenarios where merging is needed: at the end 
of a job (map-only or map-reduce), and in DDL task. I'll investigate more into 
this.

Any idea or suggestion is appreciated. Thanks.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-18 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8043:
-
Attachment: HIVE-8043.1-spark.patch

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Attachments: HIVE-8043.1-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-18 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8043:
-
Status: Patch Available  (was: Open)

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Attachments: HIVE-8043.1-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138780#comment-14138780
 ] 

Rui Li commented on HIVE-8043:
--

Support merging files for spark.
For non-rc files, the merging task is simply a MapWork.
For RC/Orc files, the merging task is a MergeFileWork. And 
SparkMergeFileRecordHandler is added to handle it.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Attachments: HIVE-8043.1-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138824#comment-14138824
 ] 

Rui Li commented on HIVE-8043:
--

Hi [~xuefuz],

The DDL task that merges files is an alter table statement:
{code}
ALTER TABLE tbl CONCATENATE;
{code}
In this case, the DDL task creates a {{MergeFileTask}} and {{MergeFileTask}} 
launches an MR job to merge the files. This feature currently only supports 
RC/Orc tables.

Strange thing is that I didn't find anything about this in the wiki or other 
official doc. Maybe I'm missing something?

The main problem I see here is that, ideally we should launch the job according 
to the execution engine. But DDL task uses a different semantic analyzer 
{{DDLSemanticAnalyzer}}, and always launches an MR job. I think Tez doesn't 
handle this either.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>    Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Attachments: HIVE-8043.1-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-18 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8043:
-
Attachment: HIVE-8043.2-spark.patch

Update the golden files for failed tests, since the diff is only in the query 
plan.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Attachments: HIVE-8043.1-spark.patch, HIVE-8043.2-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139918#comment-14139918
 ] 

Rui Li commented on HIVE-8043:
--

I looked more and found {{MergeFileTask}} uses {{HadoopJobExecHelper}} which 
sets the execution engine to MR in the constructor method. So I think we don't 
have to worry about the DDL task.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Attachments: HIVE-8043.1-spark.patch, HIVE-8043.2-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-19 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8043:
-
Attachment: HIVE-8043.3-spark.patch

Fix a bug in getting file system for the output dir

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Attachments: HIVE-8043.1-spark.patch, HIVE-8043.2-spark.patch, 
> HIVE-8043.3-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]

2014-09-24 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147328#comment-14147328
 ] 

Rui Li commented on HIVE-7382:
--

Hi [~xuefuz],

Just to clarify, we currently use spark local mode to run the tests. And our 
goal here is to use local-cluster mode to run the test, right?
So far I've found, local mode runs all the spark executor, backend, etc. in the 
same JVM, while in local-cluster mode, master and workers run in the same JVM 
and the executors run in separate JVMs. Local-cluster mode resembles the 
standalone mode, except that master and workers run in the same JVM and 
everything runs on a single machine.
Intuitively, cluster-local mode can catch more errors we may have. But this 
mode seems to be intended only for spark tests and not exposed to users. We may 
need to make sure if we really want to use it.

> Create a MiniSparkCluster and set up a testing framework [Spark Branch]
> ---
>
> Key: HIVE-7382
> URL: https://issues.apache.org/jira/browse/HIVE-7382
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> To automatically test Hive functionality over Spark execution engine, we need 
> to create a test framework that can execute Hive queries with Spark as the 
> backend. For that, we should create a MiniSparkCluser for this, similar to 
> other execution engines.
> Spark has a way to create a local cluster with a few processes in the local 
> machine, each process is a work node. It's fairly close to a real Spark 
> cluster. Our mini cluster can be based on that.
> For more info, please refer to the design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]

2014-09-24 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147330#comment-14147330
 ] 

Rui Li commented on HIVE-7382:
--

there's some discussions here:
https://spark-project.atlassian.net/browse/SPARK-595

> Create a MiniSparkCluster and set up a testing framework [Spark Branch]
> ---
>
> Key: HIVE-7382
> URL: https://issues.apache.org/jira/browse/HIVE-7382
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> To automatically test Hive functionality over Spark execution engine, we need 
> to create a test framework that can execute Hive queries with Spark as the 
> backend. For that, we should create a MiniSparkCluser for this, similar to 
> other execution engines.
> Spark has a way to create a local cluster with a few processes in the local 
> machine, each process is a work node. It's fairly close to a real Spark 
> cluster. Our mini cluster can be based on that.
> For more info, please refer to the design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]

2014-09-25 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147714#comment-14147714
 ] 

Rui Li commented on HIVE-7382:
--

Hi [~xuefuz],

I hit the same problem as Szehon mentioned.

After some digging, I think this is because in local-cluster mode spark will 
launch separate JVMs for executor backends. So it needs to run some scripts to 
determine proper class path (and probably something else), please refer to 
{{CommandUtils.buildCommandSeq}}, which is called when {{ExecutorRunner}} tries 
to launch the executor backend.
Therefore local-cluster mode requires an installation of spark, and spark.home 
or spark.test.home to be properly set. I think this is all right if 
local-cluster is merely used for spark unit tests. But it shouldn't be used for 
user applications, because it's not that "local" in the sense it requires an 
installation of spark.

To verify my guess, I run some hive query (not tests) on spark without setting 
spark.home. It runs well on standalone and local modes, but got the same error 
with local-cluster mode.
To make it work, I have to export SPARK_HOME properly. (Please note setting 
spark.home or spark.testing + spark.test.home in SparkConf won't help)

What's your opinion?

> Create a MiniSparkCluster and set up a testing framework [Spark Branch]
> ---
>
> Key: HIVE-7382
> URL: https://issues.apache.org/jira/browse/HIVE-7382
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>    Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> To automatically test Hive functionality over Spark execution engine, we need 
> to create a test framework that can execute Hive queries with Spark as the 
> backend. For that, we should create a MiniSparkCluser for this, similar to 
> other execution engines.
> Spark has a way to create a local cluster with a few processes in the local 
> machine, each process is a work node. It's fairly close to a real Spark 
> cluster. Our mini cluster can be based on that.
> For more info, please refer to the design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]

2014-09-25 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148600#comment-14148600
 ] 

Rui Li commented on HIVE-7382:
--

[~xuefuz] - Sure, it'll be great that the spark community can help with this :)

> Create a MiniSparkCluster and set up a testing framework [Spark Branch]
> ---
>
> Key: HIVE-7382
> URL: https://issues.apache.org/jira/browse/HIVE-7382
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> To automatically test Hive functionality over Spark execution engine, we need 
> to create a test framework that can execute Hive queries with Spark as the 
> backend. For that, we should create a MiniSparkCluser for this, similar to 
> other execution engines.
> Spark has a way to create a local cluster with a few processes in the local 
> machine, each process is a work node. It's fairly close to a real Spark 
> cluster. Our mini cluster can be based on that.
> For more info, please refer to the design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8300) Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch]

2014-09-29 Thread Rui Li (JIRA)
Rui Li created HIVE-8300:


 Summary: Missing guava lib causes IllegalStateException when 
deserializing a task [Spark Branch]
 Key: HIVE-8300
 URL: https://issues.apache.org/jira/browse/HIVE-8300
 Project: Hive
  Issue Type: Bug
  Components: Spark
 Environment: Spark-1.2.0-SNAPSHOT
Reporter: Rui Li


In spark-1.2, we have guava shaded in spark-assembly. And we only ship 
hive-exec to spark cluster. So spark executor won't have (original) guava in 
its class path.
This can cause some problem when TaskRunner deserializes a task, and throws 
something like this:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, node13-1): java.lang.IllegalStateException: unread block data

java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:164)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
{code}
We may have to verify this issue and ship guava to spark cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8406) Research on skewed join [Spark Branch]

2014-10-08 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-8406:


Assignee: Rui Li

> Research on skewed join [Spark Branch]
> --
>
> Key: HIVE-8406
> URL: https://issues.apache.org/jira/browse/HIVE-8406
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>
> Research on how to handle skewed join for hive on spark. Here is original 
> hive's design doc for skewed join, 
> https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7893) Find a way to get a job identifier when submitting a spark job [Spark Branch]

2014-10-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170353#comment-14170353
 ] 

Rui Li commented on HIVE-7893:
--

Thanks [~joshrosen]. Yes I've seen your PR and really appreciate your help!
FYI, we're using the async API in HIVE-7439. It's the original foreachAsync I 
added, of course. But we can update to use your stabilized ones when your PR 
gets merged.
Thanks again!

> Find a way to get a job identifier when submitting a spark job [Spark Branch]
> -
>
> Key: HIVE-7893
> URL: https://issues.apache.org/jira/browse/HIVE-7893
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>        Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
>  Labels: Spark-M3
>
> Currently we use the {{foreach}} RDD action to submit a spark job. In order 
> to implement job monitoring functionality (HIVE-7438), we need to get a job 
> identifier when submitting the job, so that we can later register some 
> listener for that specific job.
> This task requires facilitation from spark side (SPARK-2636). I've tried to 
> use {{AsyncRDDActions}} instead of the traditional actions, and it proved to 
> be a possible way to get the job ID we need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2014-10-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170387#comment-14170387
 ] 

Rui Li commented on HIVE-7439:
--

The async APIs are stabilized in SPARK-3902.

> Spark job monitoring and error reporting [Spark Branch]
> ---
>
> Key: HIVE-7439
> URL: https://issues.apache.org/jira/browse/HIVE-7439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-7439.1-spark.patch, HIVE-7439.2-spark.patch, 
> HIVE-7439.2-spark.patch, HIVE-7439.3-spark.patch, HIVE-7439.3-spark.patch, 
> hive on spark job status.PNG
>
>
> After Hive submits a job to Spark cluster, we need to report to user the job 
> progress, such as the percentage done, to the user. This is especially 
> important for long running queries. Moreover, if there is an error during job 
> submission or execution, it's also crucial for hive to fetch the error log 
> and/or stacktrace and feedback it to the user.
> Please refer design doc on wiki for more information.
> CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2014-10-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170417#comment-14170417
 ] 

Rui Li commented on HIVE-7439:
--

+1 patch looks good to me.

Only a minor point: can we print the meaning of all these statistics at the top 
so that users can better understand what they see?

> Spark job monitoring and error reporting [Spark Branch]
> ---
>
> Key: HIVE-7439
> URL: https://issues.apache.org/jira/browse/HIVE-7439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-7439.1-spark.patch, HIVE-7439.2-spark.patch, 
> HIVE-7439.2-spark.patch, HIVE-7439.3-spark.patch, HIVE-7439.3-spark.patch, 
> hive on spark job status.PNG
>
>
> After Hive submits a job to Spark cluster, we need to report to user the job 
> progress, such as the percentage done, to the user. This is especially 
> important for long running queries. Moreover, if there is an error during job 
> submission or execution, it's also crucial for hive to fetch the error log 
> and/or stacktrace and feedback it to the user.
> Please refer design doc on wiki for more information.
> CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2014-10-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170426#comment-14170426
 ] 

Rui Li commented on HIVE-7439:
--

[~brocknoland] - Yep, that'll be fine.

> Spark job monitoring and error reporting [Spark Branch]
> ---
>
> Key: HIVE-7439
> URL: https://issues.apache.org/jira/browse/HIVE-7439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Fix For: spark-branch
>
> Attachments: HIVE-7439.1-spark.patch, HIVE-7439.2-spark.patch, 
> HIVE-7439.2-spark.patch, HIVE-7439.3-spark.patch, HIVE-7439.3-spark.patch, 
> hive on spark job status.PNG
>
>
> After Hive submits a job to Spark cluster, we need to report to user the job 
> progress, such as the percentage done, to the user. This is especially 
> important for long running queries. Moreover, if there is an error during job 
> submission or execution, it's also crucial for hive to fetch the error log 
> and/or stacktrace and feedback it to the user.
> Please refer design doc on wiki for more information.
> CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7467) When querying HBase table, task fails with exception: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString

2014-10-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171844#comment-14171844
 ] 

Rui Li commented on HIVE-7467:
--

[~jxiang] - thanks for the update!
I know there's workarounds to this issue. The JIRA here is mainly to make sure 
if there's anything else we should do. If you think we're good with the 
workarounds, please feel free to resolve this issue :-)

> When querying HBase table, task fails with exception: 
> java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString
> ---
>
> Key: HIVE-7467
> URL: https://issues.apache.org/jira/browse/HIVE-7467
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Spark-1.0.0, HBase-0.98.2
>Reporter: Rui Li
>Assignee: Jimmy Xiang
>
> When I run select count( * ) on an HBase table, spark task fails with:
> {quote}
> java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString
> at 
> org.apache.hadoop.hbase.protobuf.RequestConverter.buildRegionSpecifier(RequestConverter.java:910)
> at 
> org.apache.hadoop.hbase.protobuf.RequestConverter.buildGetRowOrBeforeRequest(RequestConverter.java:131)
> at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1403)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1181)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1059)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1016)
> at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:192)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:165)
> at 
> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:93)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:241)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:193)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> {quote}
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8456) Support Hive Counter to collect spark job metric[Spark Branch]

2014-10-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173247#comment-14173247
 ] 

Rui Li commented on HIVE-8456:
--

I'm not familiar how the counter/accumulator works. Just a few high level 
questions:

1. Shall we think of better names for the new classes? Because the naming (e.g. 
SparkCounterGroup and SparkCounters) seems a little bit confusing to me.

2. Have we defined all the counters in 
{{SparkCounters.initializeSparkCounters}}? For example, it seems 
{{Operator.HIVECOUNTERFATAL}} isn't added there.

3. The Counter enum in operators doesn't seem to be used as "Counter" in hive. 
Rather, it's just kept in {{statsMap : HashMap, LongWritable>}}. Maybe 
we shouldn't add them as SparkCounter? If we do want to wrap them as 
SparkCounter, there're other operators to handle other than MapOperator, e.g. 
FilterOperator and  JoinOperator also have such an enum.

4. Maybe we should always use {{HiveConf.ConfVars.HIVECOUNTERGROUP}} as the 
group name, rather than the enum class name 
({{key.getDeclaringClass().getName()}})?

> Support Hive Counter to collect spark job metric[Spark Branch]
> --
>
> Key: HIVE-8456
> URL: https://issues.apache.org/jira/browse/HIVE-8456
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-8456.1-spark.patch, HIVE-8456.2-spark.patch
>
>
> Several Hive query metric in Hive operators is collected by Hive Counter, 
> such as CREATEDFILES and DESERIALIZE_ERRORS, Besides, Hive use Counter as an 
> option to collect table stats info.  Spark support Accumulator which is 
> pretty similiar with Hive Counter, we could try to enable Hive Counter based 
> on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8456) Support Hive Counter to collect spark job metric[Spark Branch]

2014-10-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173414#comment-14173414
 ] 

Rui Li commented on HIVE-8456:
--

[~chengxiang li] - thanks for the explanation!

I agree we don't have to identify all the needed counters for now.
For #3, I don't see hive create counters for those enums. So do you mean it's 
an improvement to add counters for them on spark?

> Support Hive Counter to collect spark job metric[Spark Branch]
> --
>
> Key: HIVE-8456
> URL: https://issues.apache.org/jira/browse/HIVE-8456
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-8456.1-spark.patch, HIVE-8456.2-spark.patch, 
> HIVE-8456.3-spark.patch
>
>
> Several Hive query metric in Hive operators is collected by Hive Counter, 
> such as CREATEDFILES and DESERIALIZE_ERRORS, Besides, Hive use Counter as an 
> option to collect table stats info.  Spark support Accumulator which is 
> pretty similiar with Hive Counter, we could try to enable Hive Counter based 
> on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8456) Support Hive Counter to collect spark job metric[Spark Branch]

2014-10-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173443#comment-14173443
 ] 

Rui Li commented on HIVE-8456:
--

I see. That makes sense.

+1 The patch looks good to me.

> Support Hive Counter to collect spark job metric[Spark Branch]
> --
>
> Key: HIVE-8456
> URL: https://issues.apache.org/jira/browse/HIVE-8456
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-8456.1-spark.patch, HIVE-8456.2-spark.patch, 
> HIVE-8456.3-spark.patch
>
>
> Several Hive query metric in Hive operators is collected by Hive Counter, 
> such as CREATEDFILES and DESERIALIZE_ERRORS, Besides, Hive use Counter as an 
> option to collect table stats info.  Spark support Accumulator which is 
> pretty similiar with Hive Counter, we could try to enable Hive Counter based 
> on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8406) Research on skewed join [Spark Branch]

2014-10-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173449#comment-14173449
 ] 

Rui Li commented on HIVE-8406:
--

Skew join optimization depends on map join.

> Research on skewed join [Spark Branch]
> --
>
> Key: HIVE-8406
> URL: https://issues.apache.org/jira/browse/HIVE-8406
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>
> Research on how to handle skewed join for hive on spark. Here is original 
> hive's design doc for skewed join, 
> https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-7893) Find a way to get a job identifier when submitting a spark job [Spark Branch]

2014-10-19 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li resolved HIVE-7893.
--
Resolution: Fixed

Fixed via HIVE-7439

> Find a way to get a job identifier when submitting a spark job [Spark Branch]
> -
>
> Key: HIVE-7893
> URL: https://issues.apache.org/jira/browse/HIVE-7893
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>    Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
>  Labels: Spark-M3
>
> Currently we use the {{foreach}} RDD action to submit a spark job. In order 
> to implement job monitoring functionality (HIVE-7438), we need to get a job 
> identifier when submitting the job, so that we can later register some 
> listener for that specific job.
> This task requires facilitation from spark side (SPARK-2636). I've tried to 
> use {{AsyncRDDActions}} instead of the traditional actions, and it proved to 
> be a possible way to get the job ID we need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8518) Compile time skew join optimization returns duplicated results

2014-10-20 Thread Rui Li (JIRA)
Rui Li created HIVE-8518:


 Summary: Compile time skew join optimization returns duplicated 
results
 Key: HIVE-8518
 URL: https://issues.apache.org/jira/browse/HIVE-8518
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


Compile time skew join optimization clones the join operator tree and unions 
the results.
The problem here is that we don't properly insert the predicate for the cloned 
join (relying on an assert statement).

To reproduce the issue, run the simple query:
{code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code}
And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER 
statement).
Duplicated results will be returned if you set 
hive.optimize.skewjoin.compiletime=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8518) Compile time skew join optimization returns duplicated results

2014-10-20 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-8518:


Assignee: Rui Li

> Compile time skew join optimization returns duplicated results
> --
>
> Key: HIVE-8518
> URL: https://issues.apache.org/jira/browse/HIVE-8518
> Project: Hive
>  Issue Type: Bug
>    Reporter: Rui Li
>    Assignee: Rui Li
>
> Compile time skew join optimization clones the join operator tree and unions 
> the results.
> The problem here is that we don't properly insert the predicate for the 
> cloned join (relying on an assert statement).
> To reproduce the issue, run the simple query:
> {code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code}
> And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER 
> statement).
> Duplicated results will be returned if you set 
> hive.optimize.skewjoin.compiletime=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8518) Compile time skew join optimization returns duplicated results

2014-10-20 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8518:
-
Attachment: HIVE-8518.1.patch

> Compile time skew join optimization returns duplicated results
> --
>
> Key: HIVE-8518
> URL: https://issues.apache.org/jira/browse/HIVE-8518
> Project: Hive
>  Issue Type: Bug
>    Reporter: Rui Li
>    Assignee: Rui Li
> Attachments: HIVE-8518.1.patch
>
>
> Compile time skew join optimization clones the join operator tree and unions 
> the results.
> The problem here is that we don't properly insert the predicate for the 
> cloned join (relying on an assert statement).
> To reproduce the issue, run the simple query:
> {code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code}
> And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER 
> statement).
> Duplicated results will be returned if you set 
> hive.optimize.skewjoin.compiletime=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8518) Compile time skew join optimization returns duplicated results

2014-10-20 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8518:
-
Status: Patch Available  (was: Open)

> Compile time skew join optimization returns duplicated results
> --
>
> Key: HIVE-8518
> URL: https://issues.apache.org/jira/browse/HIVE-8518
> Project: Hive
>  Issue Type: Bug
>    Reporter: Rui Li
>    Assignee: Rui Li
> Attachments: HIVE-8518.1.patch
>
>
> Compile time skew join optimization clones the join operator tree and unions 
> the results.
> The problem here is that we don't properly insert the predicate for the 
> cloned join (relying on an assert statement).
> To reproduce the issue, run the simple query:
> {code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code}
> And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER 
> statement).
> Duplicated results will be returned if you set 
> hive.optimize.skewjoin.compiletime=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8518) Compile time skew join optimization returns duplicated results

2014-10-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176755#comment-14176755
 ] 

Rui Li commented on HIVE-8518:
--

cc [~xuefuz]

> Compile time skew join optimization returns duplicated results
> --
>
> Key: HIVE-8518
> URL: https://issues.apache.org/jira/browse/HIVE-8518
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8518.1.patch
>
>
> Compile time skew join optimization clones the join operator tree and unions 
> the results.
> The problem here is that we don't properly insert the predicate for the 
> cloned join (relying on an assert statement).
> To reproduce the issue, run the simple query:
> {code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code}
> And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER 
> statement).
> Duplicated results will be returned if you set 
> hive.optimize.skewjoin.compiletime=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8518) Compile time skew join optimization returns duplicated results

2014-10-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177807#comment-14177807
 ] 

Rui Li commented on HIVE-8518:
--

Hi [~xuefuz], I find there're already several unit tests for compile time skew 
join, e.g. skewjoinopt1.q. They all passed the pre-commit test. That makes 
sense because the assert statement is triggered in UT so the patch here won't 
change test results.

> Compile time skew join optimization returns duplicated results
> --
>
> Key: HIVE-8518
> URL: https://issues.apache.org/jira/browse/HIVE-8518
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8518.1.patch
>
>
> Compile time skew join optimization clones the join operator tree and unions 
> the results.
> The problem here is that we don't properly insert the predicate for the 
> cloned join (relying on an assert statement).
> To reproduce the issue, run the simple query:
> {code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code}
> And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER 
> statement).
> Duplicated results will be returned if you set 
> hive.optimize.skewjoin.compiletime=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   >