Re: Review Request 35950: HIVE-11131: Get row information on DataWritableWriter once for better writing performance

2015-06-29 Thread Dong Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35950/#review89860
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(line 63)


shall we keep this as 'final'?


Nice refactor. The change looks good. Thanks

- Dong Chen


On June 28, 2015, 12:29 a.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35950/
> ---
> 
> (Updated June 28, 2015, 12:29 a.m.)
> 
> 
> Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
> 
> 
> Bugs: HIVE-11131
> https://issues.apache.org/jira/browse/HIVE-11131
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Implemented data type writers that will be created before the first Hive row 
> is written to Parquet. These writers contain information about object 
> inspectors and schema of a specific data type, and calls the specific 
> add() method used by Parquet for each data type.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
>  c195c3ec3ddae19bf255fc2c9633f8bf4390f428 
> 
> Diff: https://reviews.apache.org/r/35950/diff/
> 
> 
> Testing
> ---
> 
> Tests from TestDataWritableWriter run OK.
> 
> I run other tests with micro-becnhmarks, and I got some better results from 
> this new implemntation:
> 
> Using repeated rows across the file, this is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 7.598 7.491   7.488   7.588   7.530.270 (before)
> 10.13711.511  10.155  10.297  10.242  0.286 (after)
> 
> Using random rows across the file, the is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 5.268 7.723   4.107   4.173   4.729   0.20   (before)
> 6.236 10.466  5.944   4.749   5.234   0.22   (after)
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



[jira] [Created] (HIVE-11148) LLAP: fix TestLlapTaskSchedulerService flakiness

2015-06-29 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11148:
---

 Summary: LLAP: fix TestLlapTaskSchedulerService flakiness
 Key: HIVE-11148
 URL: https://issues.apache.org/jira/browse/HIVE-11148
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth


See HIVE-11017 and comments in the class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 36033: HIVE-11108: HashTableSinkOperator doesn't support vectorization [Spark Branch]

2015-06-29 Thread Rui Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36033/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-11108
https://issues.apache.org/jira/browse/HIVE-11108


Repository: hive-git


Description
---

This prevents any BaseWork containing HTS from being vectorized. It's basically 
specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks.
We should verify if it makes sense to make HTS support vectorization.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java c4554a7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
7c67fd2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkHashTableSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
c6a43d9 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenSparkSkewJoinProcessor.java
 7ebd18d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
e7b9c73 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 fd42959 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkHashTableSinkDesc.java 
ff32f5e 
  ql/src/test/results/clientpositive/spark/vector_decimal_mapjoin.q.out a80a20b 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out a0e6c2a 
  ql/src/test/results/clientpositive/spark/vector_mapjoin_reduce.q.out 8cf1a81 
  ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out b6c2b35 
  ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out 
a25d540 

Diff: https://reviews.apache.org/r/36033/diff/


Testing
---


Thanks,

Rui Li



[jira] [Created] (HIVE-11147) MetaTool doesn't update FS root location for partitions with space in name

2015-06-29 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11147:


 Summary: MetaTool doesn't update FS root location for partitions 
with space in name
 Key: HIVE-11147
 URL: https://issues.apache.org/jira/browse/HIVE-11147
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Wei Zheng
Assignee: Wei Zheng


Problem happens when trying to update the FS root location:
{code}
# HIVE_CONF_DIR=/etc/hive/conf.server/ hive --service metatool -dryRun 
-updateLocation hdfs://mycluster hdfs://c6401.ambari.apache.org:8020
...
Looking for LOCATION_URI field in DBS table to update..
Dry Run of updateLocation on table DBS..
old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse new 
location: hdfs://mycluster/apps/hive/warehouse
Found 1 records in DBS table to update
Looking for LOCATION field in SDS table to update..
Dry Run of updateLocation on table SDS..
old location: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=12
 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=12
old location: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=13
 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=13
...
Found 143 records in SDS table to update
Warning: Found records with bad LOCATION in SDS table..
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced
 Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced
 Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4
 yr Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4
 yr Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2
 yr Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2
 yr Degree
{code}

The reason why some entries are marked as bad location is that they have space 
character in the partition name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11146) LLAP: merge master into branch

2015-06-29 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11146:
---

 Summary: LLAP: merge master into branch
 Key: HIVE-11146
 URL: https://issues.apache.org/jira/browse/HIVE-11146
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11145) Remove OFFLINE and NO_DROP from tables and partitions

2015-06-29 Thread Alan Gates (JIRA)
Alan Gates created HIVE-11145:
-

 Summary: Remove OFFLINE and NO_DROP from tables and partitions
 Key: HIVE-11145
 URL: https://issues.apache.org/jira/browse/HIVE-11145
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, SQL
Affects Versions: hbase-metastore-branch
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: hbase-metastore-branch


Currently a table or partition can be marked no_drop or offline.  This prevents 
users from dropping or reading (and dropping) the table or partition.  This was 
built in 0.7 before SQL standard authorization was an option. 

This is an expensive feature as when a table is dropped every partition must be 
fetched and checked to make sure it can be dropped.

This feature is also redundant now that real authorization is available in Hive.

This feature should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35950: HIVE-11131: Get row information on DataWritableWriter once for better writing performance

2015-06-29 Thread Ryan Blue

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35950/#review89812
---

Ship it!


Ship It!

- Ryan Blue


On June 27, 2015, 5:29 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35950/
> ---
> 
> (Updated June 27, 2015, 5:29 p.m.)
> 
> 
> Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
> 
> 
> Bugs: HIVE-11131
> https://issues.apache.org/jira/browse/HIVE-11131
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Implemented data type writers that will be created before the first Hive row 
> is written to Parquet. These writers contain information about object 
> inspectors and schema of a specific data type, and calls the specific 
> add() method used by Parquet for each data type.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
>  c195c3ec3ddae19bf255fc2c9633f8bf4390f428 
> 
> Diff: https://reviews.apache.org/r/35950/diff/
> 
> 
> Testing
> ---
> 
> Tests from TestDataWritableWriter run OK.
> 
> I run other tests with micro-becnhmarks, and I got some better results from 
> this new implemntation:
> 
> Using repeated rows across the file, this is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 7.598 7.491   7.488   7.588   7.530.270 (before)
> 10.13711.511  10.155  10.297  10.242  0.286 (after)
> 
> Using random rows across the file, the is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 5.268 7.723   4.107   4.173   4.729   0.20   (before)
> 6.236 10.466  5.944   4.749   5.234   0.22   (after)
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



Review Request 36025: HIVE-11139 Emit more lineage information

2015-06-29 Thread Jimmy Xiang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36025/
---

Review request for hive and Xuefu Zhang.


Repository: hive-git


Description
---

Extended HIVE-1131 to log more lineage info. A post exec hook, LineageLogger, 
is added to log the lineage in JSON format.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 669e6be 
  ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java a0d61f5 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java 0c6a938 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageInfo.java f98b38b 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/log/NoDeleteRollingFileAppender.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/ExprProcFactory.java 
c930b80 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java 
51bef04 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/LineageCtx.java 
cef24e3 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java 
5957ac0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f41668b 
  ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 3a1a4af 
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 3a4ea2f 
  ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java e716ed2 
  
ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java
 e1cab79 
  ql/src/test/queries/clientpositive/lineage2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/lineage3.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_partition_change_col.q.out 0d97b7a 
  ql/src/test/results/clientpositive/alter_table_cascade.q.out 0139466 
  ql/src/test/results/clientpositive/combine2.q.out 2400c96 
  ql/src/test/results/clientpositive/groupby_sort_1_23.q.out 34cd1ff 
  ql/src/test/results/clientpositive/groupby_sort_skew_1_23.q.out 0d631ce 
  ql/src/test/results/clientpositive/index_auto_mult_tables.q.out c3c1fc8 
  ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 
e3dfcb7 
  ql/src/test/results/clientpositive/index_auto_partitioned.q.out f3ae876 
  ql/src/test/results/clientpositive/index_auto_update.q.out 52509be 
  ql/src/test/results/clientpositive/index_bitmap.q.out 596312d 
  ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
8d4774d 
  ql/src/test/results/clientpositive/index_bitmap_rc.q.out 45fe339 
  ql/src/test/results/clientpositive/index_compact.q.out a33f82a 
  ql/src/test/results/clientpositive/index_compact_2.q.out fd4cdf9 
  ql/src/test/results/clientpositive/join34.q.out 48c3b74 
  ql/src/test/results/clientpositive/join35.q.out c0372a7 
  ql/src/test/results/clientpositive/lineage1.q.out d9f1ce3 
  ql/src/test/results/clientpositive/lineage2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/lineage3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/load_dyn_part13.q.out 1776e12 
  ql/src/test/results/clientpositive/multiMapJoin1.q.out 08d2bc1 
  ql/src/test/results/clientpositive/multi_insert.q.out ea2e554 
  
ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
 0fbad49 
  ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out 10ed4ac 
  ql/src/test/results/clientpositive/ptf.q.out 6bd1747 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 6c0f4a5 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 8248a70 
  ql/src/test/results/clientpositive/spark/join34.q.out 09b8b6b 
  ql/src/test/results/clientpositive/spark/join35.q.out e84c860 
  ql/src/test/results/clientpositive/spark/load_dyn_part13.q.out 4480310 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out c77eb05 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 e3ef39e 
  ql/src/test/results/clientpositive/spark/ptf.q.out 1ca6951 
  ql/src/test/results/clientpositive/spark/union22.q.out fdb4d47 
  ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
  ql/src/test/results/clientpositive/spark/union29.q.out 1776b4d 
  ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
  ql/src/test/results/clientpositive/spark/union33.q.out 0e6b1aa 
  ql/src/test/results/clientpositive/spark/union_date_trim.q.out 86b96ac 
  ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
  ql/src/test/results/clientpositive/spark/union_remove_10.q.out 2718775 
  ql/src/test/results/clientpositive/spark/union_remove_11.q.out be65741 
  ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
  ql/src/test/results/clientpositive/spark/union_remove_17.q.out 74a5b23 
  ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e1

Re: Review Request 35968: 1. Added preliminary UDF code for cosine similarity. 2. Added unit tests and integration tests. 3. Registered the UDF in the FunctionRegistry class.

2015-06-29 Thread Nishant Kelkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35968/
---

(Updated June 29, 2015, 9:24 p.m.)


Review request for hive and Alexander Pivovarov.


Changes
---

Removed dependency on commons-math3 FastMath class.


Repository: hive-git


Description
---

1. Added preliminary UDF code for cosine similarity. 2. Added unit tests and 
integration tests. 3. Registered the UDF in the FunctionRegistry class.


Diffs (updated)
-

  .reviewboardrc abc33f91a44b76573cbba334c33417307c63956f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
fabc21e2092561cbf98c35a406e4ee40e71fe1de 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCosineSimilarity.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFCosineSimilarity.java 
PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_cosine_similarity_error_1.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_cosine_similarity_wrongargs_1.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_cosine_similarity_wrongargs_2.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_cosine_similarity_wrongargs_3.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_cosine_similarity.q PRE-CREATION 
  ql/src/test/results/clientnegative/udf_cosine_similarity_error_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/udf_cosine_similarity_error_2.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/udf_cosine_similarity_wrongargs_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/udf_cosine_similarity_wrongargs_2.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/udf_cosine_similarity_wrongargs_3.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 
5de4ffcd1ace477af026b83fb7bfb8068fc192b3 
  ql/src/test/results/clientpositive/udf_cosine_similarity.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/35968/diff/


Testing
---

Function signature of the UDF is: cosine_similarity(Text, Text, Text)

Each "Text" can be one of {S=something,E=empty,N=null}

Unit tests written for the following cases:
1. cosine_similarity(S, S, S)
2. cosine_similarity(S, E, S)
3. cosine_similarity(N, E, S)
4. cosine_similarity(S, S, E)
5. cosine_similarity(N, N, N)


Thanks,

Nishant Kelkar



Re: Review Request 35968: 1. Added preliminary UDF code for cosine similarity. 2. Added unit tests and integration tests. 3. Registered the UDF in the FunctionRegistry class.

2015-06-29 Thread Nishant Kelkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35968/#review89800
---



ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCosineSimilarity.java (line 4)


Oh, sorry didn't read well. You mean math3. Yes, you're right about this. 
Moving to use 

{code}
Math.pow(double, double)
{code}

instead.


- Nishant Kelkar


On June 28, 2015, 11:39 a.m., Nishant Kelkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35968/
> ---
> 
> (Updated June 28, 2015, 11:39 a.m.)
> 
> 
> Review request for hive and Alexander Pivovarov.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Added preliminary UDF code for cosine similarity. 2. Added unit tests and 
> integration tests. 3. Registered the UDF in the FunctionRegistry class.
> 
> 
> Diffs
> -
> 
>   .reviewboardrc abc33f91a44b76573cbba334c33417307c63956f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
> fabc21e2092561cbf98c35a406e4ee40e71fe1de 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCosineSimilarity.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFCosineSimilarity.java 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_cosine_similarity_error_1.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_cosine_similarity_wrongargs_1.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_cosine_similarity_wrongargs_2.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_cosine_similarity_wrongargs_3.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/udf_cosine_similarity.q PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_cosine_similarity_error_1.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_cosine_similarity_error_2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_cosine_similarity_wrongargs_1.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_cosine_similarity_wrongargs_2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_cosine_similarity_wrongargs_3.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 
> 5de4ffcd1ace477af026b83fb7bfb8068fc192b3 
>   ql/src/test/results/clientpositive/udf_cosine_similarity.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/35968/diff/
> 
> 
> Testing
> ---
> 
> Function signature of the UDF is: cosine_similarity(Text, Text, Text)
> 
> Each "Text" can be one of {S=something,E=empty,N=null}
> 
> Unit tests written for the following cases:
> 1. cosine_similarity(S, S, S)
> 2. cosine_similarity(S, E, S)
> 3. cosine_similarity(N, E, S)
> 4. cosine_similarity(S, S, E)
> 5. cosine_similarity(N, N, N)
> 
> 
> Thanks,
> 
> Nishant Kelkar
> 
>



[jira] [Created] (HIVE-11144) Replace row by row reader and writer with shims to vectorized path.

2015-06-29 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11144:


 Summary: Replace row by row reader and writer with shims to 
vectorized path.
 Key: HIVE-11144
 URL: https://issues.apache.org/jira/browse/HIVE-11144
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The core ORC reader and writer will be better served if the vectorized read and 
write paths are the primary API and the row by row reader and writer and their 
corresponding object inspectors become Hive-specific shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35803: HIVE-10895 ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Aihua Xu


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 3485
> > 
> >
> > Should listPrincipalGlobalGrants be passed in a queryWrapper as well?
> 
> Aihua Xu wrote:
> listPrincipalGlobalGrants has handled the releasing the resource by 
> itself, so we don't need to pass that in.
> 
> Chaoyu Tang wrote:
> In local copy of code, the query in this method is not self closed (line 
> 4395)
> {code}
>   @SuppressWarnings("unchecked")
>   @Override
>   public List listPrincipalGlobalGrants(String 
> principalName, PrincipalType principalType) {
> boolean commited = false;
> List userNameDbPriv = null;
> try {
>   openTransaction();
>   if (principalName != null) {
> Query query = pm.newQuery(MGlobalPrivilege.class,
> "principalName == t1 && principalType == t2 ");
> query.declareParameters(
> "java.lang.String t1, java.lang.String t2");
> userNameDbPriv = (List) query
> .executeWithArray(principalName, principalType.toString());
> pm.retrieveAll(userNameDbPriv);
>   }
>   commited = commitTransaction();
> } finally {
>   if (!commited) {
> rollbackTransaction();
>   }
> }
> return userNameDbPriv;
>   }
> {code}

My patch will close the query. See line 4493.


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 3477
> > 
> >
> > Do we have to use a list of queryWrappers with each wrapper for a 
> > different listXXX call? If so, we need open eight queries for this 
> > removeRole methods. Could we use only one and reuse it?
> 
> Aihua Xu wrote:
> Originally we were already using eight queries and I was not trying to 
> change that logic in this patch. And also we can't reuse the same query since 
> we need to release the resource for each of the query anyway.
> 
> Chaoyu Tang wrote:
> What I meant reuse is that to use only one queryWrapper and close it 
> after each call to listXXX, so there will be at most one open query (cursor) 
> at a time in this method and help the scalability.

Yeah. We can do that.


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35803/#review89749
---


On June 26, 2015, 5:23 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35803/
> ---
> 
> (Updated June 26, 2015, 5:23 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-10895 ObjectStore does not close Query objects in some calls, causing a 
> potential leak in some metastore db resources
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 417ecc8 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 
> d0ff329 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestObjectStore.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/35803/diff/
> 
> 
> Testing
> ---
> 
> Testing has been done.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



[jira] [Created] (HIVE-11143) Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with updated Java timezone information

2015-06-29 Thread Jason Dere (JIRA)
Jason Dere created HIVE-11143:
-

 Summary: Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do 
not work with updated Java timezone information
 Key: HIVE-11143
 URL: https://issues.apache.org/jira/browse/HIVE-11143
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere


It looks like there were recent changes to the Europe/Moscow time zone in 2014. 
When udf_from_utc_timestamp.q/udf_to_utc_timestamp.q are run with more recent 
versions of JDK or with an updated time zone database, the tests fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11142) Update JDBC Driver to support transaction related API

2015-06-29 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11142:
-

 Summary: Update JDBC Driver to support transaction related API
 Key: HIVE-11142
 URL: https://issues.apache.org/jira/browse/HIVE-11142
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.3.0
Reporter: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35803: HIVE-10895 ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Chaoyu Tang


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 3477
> > 
> >
> > Do we have to use a list of queryWrappers with each wrapper for a 
> > different listXXX call? If so, we need open eight queries for this 
> > removeRole methods. Could we use only one and reuse it?
> 
> Aihua Xu wrote:
> Originally we were already using eight queries and I was not trying to 
> change that logic in this patch. And also we can't reuse the same query since 
> we need to release the resource for each of the query anyway.

What I meant reuse is that to use only one queryWrapper and close it after each 
call to listXXX, so there will be at most one open query (cursor) at a time in 
this method and help the scalability.


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 3485
> > 
> >
> > Should listPrincipalGlobalGrants be passed in a queryWrapper as well?
> 
> Aihua Xu wrote:
> listPrincipalGlobalGrants has handled the releasing the resource by 
> itself, so we don't need to pass that in.

In local copy of code, the query in this method is not self closed (line 4395)
{code}
  @SuppressWarnings("unchecked")
  @Override
  public List listPrincipalGlobalGrants(String principalName, 
PrincipalType principalType) {
boolean commited = false;
List userNameDbPriv = null;
try {
  openTransaction();
  if (principalName != null) {
Query query = pm.newQuery(MGlobalPrivilege.class,
"principalName == t1 && principalType == t2 ");
query.declareParameters(
"java.lang.String t1, java.lang.String t2");
userNameDbPriv = (List) query
.executeWithArray(principalName, principalType.toString());
pm.retrieveAll(userNameDbPriv);
  }
  commited = commitTransaction();
} finally {
  if (!commited) {
rollbackTransaction();
  }
}
return userNameDbPriv;
  }
{code}


- Chaoyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35803/#review89749
---


On June 26, 2015, 5:23 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35803/
> ---
> 
> (Updated June 26, 2015, 5:23 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-10895 ObjectStore does not close Query objects in some calls, causing a 
> potential leak in some metastore db resources
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 417ecc8 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 
> d0ff329 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestObjectStore.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/35803/diff/
> 
> 
> Testing
> ---
> 
> Testing has been done.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



[jira] [Created] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-11141:


 Summary: Improve RuleRegExp when the Expression node stack gets 
huge
 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


More and more complex workloads are migrated to Hive from Sql Server, Terradata 
etc.. 
And occasionally Hive gets bottlenecked on generating plans for large queries, 
the majority of the cases time is spent in fetching metadata, partitions and 
other optimizer transformation related rules

I have attached the query for the test case which needs to be tested after we 
setup database as shown below.
{code}
create database dataset_3;
use database dataset_3;
{code}

It seems that the most problematic part of the code as the stack gets arbitrary 
long, in RuleRegExp.java
{code}
  @Override
  public int cost(Stack stack) throws SemanticException {
int numElems = (stack != null ? stack.size() : 0);
String name = "";
for (int pos = numElems - 1; pos >= 0; pos--) {
  name = stack.get(pos).getName() + "%" + name;
  Matcher m = pattern.matcher(name);
  if (m.matches()) {
return m.group().length();
  }
}
return -1;
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh

2015-06-29 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11140:
-

 Summary: auto compute PROJ_HOME in 
hcatalog/src/test/e2e/templeton/deployers/env.sh
 Key: HIVE-11140
 URL: https://issues.apache.org/jira/browse/HIVE-11140
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


it's currently set as
{noformat}
if [ -z ${PROJ_HOME} ]; then
  export PROJ_HOME=/Users/${USER}/dev/hive
fi
{noformat}

but it always points to project root so can be 

{{export PROJ_HOME=../../../../../..}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35803: HIVE-10895 ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Aihua Xu


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 3477
> > 
> >
> > Do we have to use a list of queryWrappers with each wrapper for a 
> > different listXXX call? If so, we need open eight queries for this 
> > removeRole methods. Could we use only one and reuse it?

Originally we were already using eight queries and I was not trying to change 
that logic in this patch. And also we can't reuse the same query since we need 
to release the resource for each of the query anyway.


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 3485
> > 
> >
> > Should listPrincipalGlobalGrants be passed in a queryWrapper as well?

listPrincipalGlobalGrants has handled the releasing the resource by itself, so 
we don't need to pass that in.


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 4939
> > 
> >
> > Is there any reason you need an additional list to hold the retrieved 
> > result?

Yeah. That is the place to get all the returned data so that we can close the 
query. Since it's the public interface, I didn't pass QueryWapper to let the 
caller to close it. If we think there is a memory issue for some calls, we can 
do that.


> On June 29, 2015, 4:55 p.m., Chaoyu Tang wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, line 
> > 1793
> > 
> >
> > Have you looked at MetaStoreDirectSql class to see if there is any 
> > potential query leaking?

I will take a look at that class.


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35803/#review89749
---


On June 26, 2015, 5:23 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35803/
> ---
> 
> (Updated June 26, 2015, 5:23 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-10895 ObjectStore does not close Query objects in some calls, causing a 
> potential leak in some metastore db resources
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 417ecc8 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 
> d0ff329 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestObjectStore.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/35803/diff/
> 
> 
> Testing
> ---
> 
> Testing has been done.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: [ANNOUNCE] New Hive Committers - Jesus Camacho Rodriguez and Chinna Rao Lalam

2015-06-29 Thread Jesus Camachorodriguez
Thanks for this opportunity and to everybody for their messages!
And congratulations to Chinna too!

I really enjoy contributing to Hive, and I am looking forward to
continuing contributing further!

--
Jesús



On 6/29/15, 5:35 AM, "amareshwarisr ."  wrote:

>Congratulations to both of you!
>
>On Mon, Jun 29, 2015 at 6:08 AM, Xu, Cheng A  wrote:
>
>> Congratulations to Jesus and Chinna!
>>
>> Ferd
>>
>> -Original Message-
>> From: Chaoyu Tang [mailto:ctang...@gmail.com]
>> Sent: Sunday, June 28, 2015 9:13 PM
>> To: dev@hive.apache.org
>> Subject: Re: [ANNOUNCE] New Hive Committers - Jesus Camacho Rodriguez
>>and
>> Chinna Rao Lalam
>>
>> Congratulations to Jesus and Chinna!
>>
>> Choayu
>>
>>
>> On Sat, Jun 27, 2015 at 4:16 AM, Chinna Rao Lalam <
>> lalamchinnara...@gmail.com> wrote:
>>
>> > Thank you everyone. I'm excited to continue contributing to the Hive
>> > community.
>> >
>> > Congrats to Jesus.
>> >
>> > Regards,
>> > Chinna
>> >
>> > On Sat, Jun 27, 2015 at 11:18 AM, Lefty Leverenz
>> > 
>> > wrote:
>> >
>> > > Congratulations China and Jesus, and thanks for all your
>>contributions!
>> > >
>> > > -- Lefty
>> > >
>> > > On Fri, Jun 26, 2015 at 7:01 PM, Sergio Pena
>> > > 
>> > > wrote:
>> > >
>> > > > Congratulations China and Jesus !!!.
>> > > >
>> > > > - Sergio
>> > > >
>> > > > On Fri, Jun 26, 2015 at 1:57 PM, Carl Steinbach 
>> > wrote:
>> > > >
>> > > > > On behalf of the Apache Hive PMC I am pleased to announce that
>> > > > > Jesus Camacho Rodriguez and Chinna Rao Lalam have been voted in
>> > > > > as
>> > > committers.
>> > > > >
>> > > > > Please join me in congratulating Jesus and Chinna!
>> > > > >
>> > > > > Thanks.
>> > > > >
>> > > > > - Carl
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Hope It Helps,
>> > Chinna
>> >
>>



Re: Review Request 35792: HIVE-10438 - Architecture for ResultSet Compression via external plugin

2015-06-29 Thread Rohit Dholakia

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35792/
---

(Updated June 29, 2015, 5:41 p.m.)


Review request for hive, Vaibhav Gumashta, Xiaojian Wang, Xiao Meng, and Xuefu 
Zhang.


Changes
---

We have added the review comments as mentioned below:
1. Moved to a new package (cli.compression)
2. Added tests for EncodedColumnBased (using snappy as default compression)
3. Fixed whitespace. 
4. Removed default values from hive-site.xml.


Repository: hive-git


Description
---

This patch enables ResultSet compression for Hive using external plugins. The 
patch proposes a plugin architecture that enables using external plugins to 
compress ResultSets on-the-fly.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df 
  service/if/TCLIService.thrift baf583f 
  service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
  service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TEnColumn.java
 PRE-CREATION 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TExecuteStatementReq.java
 4f157ad 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java
 c973fcc 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TOpenSessionReq.java
 c048161 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TOpenSessionResp.java
 351f78b 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TProtocolVersion.java
 a4279d2 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java
 d16c8a4 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java
 24a746e 
  service/src/gen/thrift/gen-py/TCLIService/ttypes.py 068727c 
  service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb b482533 
  service/src/java/org/apache/hive/service/cli/Column.java 2e21f18 
  service/src/java/org/apache/hive/service/cli/ColumnBasedSet.java 47a582e 
  service/src/java/org/apache/hive/service/cli/RowSetFactory.java e8f68ea 
  
service/src/java/org/apache/hive/service/cli/compression/ColumnCompressor.java 
PRE-CREATION 
  
service/src/java/org/apache/hive/service/cli/compression/ColumnCompressorService.java
 PRE-CREATION 
  
service/src/java/org/apache/hive/service/cli/compression/EncodedColumnBasedSet.java
 PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
dfb7faa 
  
service/src/main/resources/META-INF/services/org.apache.hive.service.cli.compression.ColumnCompressor
 PRE-CREATION 
  
service/src/test/org/apache/hive/service/cli/compression/SnappyIntCompressor.java
 PRE-CREATION 
  
service/src/test/org/apache/hive/service/cli/compression/TestEncodedColumnBasedSet.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/35792/diff/


Testing
---

Testing has been done using a docker container-based query submitter that has 
an integer decompressor as part of it. Using the integer compressor (also 
provided) and the decompressor, the end-to-end functionality can be observed.


File Attachments (updated)


Patch file
  
https://reviews.apache.org/media/uploaded/files/2015/06/23/16aa08f8-2393-460a-83ef-72464fc537db__HIVE-10438.patch


Thanks,

Rohit Dholakia



Hive-0.14 - Build # 999 - Still Failing

2015-06-29 Thread Apache Jenkins Server
Changes for Build #980

Changes for Build #981

Changes for Build #982

Changes for Build #983

Changes for Build #984

Changes for Build #985

Changes for Build #986

Changes for Build #987

Changes for Build #988

Changes for Build #989

Changes for Build #990

Changes for Build #991

Changes for Build #992

Changes for Build #993

Changes for Build #994

Changes for Build #995

Changes for Build #996

Changes for Build #997

Changes for Build #998

Changes for Build #999



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #999)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/999/ to view 
the results.

Re: Review Request 35803: HIVE-10895 ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35803/#review89749
---



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java (line 1059)


Indentation ...



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java (line 1089)


Indentation..



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java (line 1788)


Have you looked at MetaStoreDirectSql class to see if there is any 
potential query leaking?



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java (line 3409)


Do we have to use a list of queryWrappers with each wrapper for a different 
listXXX call? If so, we need open eight queries for this removeRole methods. 
Could we use only one and reuse it?



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java (line 3417)


Should listPrincipalGlobalGrants be passed in a queryWrapper as well?



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java (line 4765)


Is there any reason you need an additional list to hold the retrieved 
result?


- Chaoyu Tang


On June 26, 2015, 5:23 p.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35803/
> ---
> 
> (Updated June 26, 2015, 5:23 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-10895 ObjectStore does not close Query objects in some calls, causing a 
> potential leak in some metastore db resources
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 417ecc8 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 
> d0ff329 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestObjectStore.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/35803/diff/
> 
> 
> Testing
> ---
> 
> Testing has been done.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



[jira] [Created] (HIVE-11139) Emit more lineage information

2015-06-29 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11139:
--

 Summary: Emit more lineage information
 Key: HIVE-11139
 URL: https://issues.apache.org/jira/browse/HIVE-11139
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


HIVE-1131 emits some column lineage info. But it doesn't support INSERT 
statements, or CTAS statements. It doesn't emit the predicate information 
either.

We can enhance and use the dependency information created in HIVE-1131, 
generate more complete lineage info.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35950: HIVE-11131: Get row information on DataWritableWriter once for better writing performance

2015-06-29 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35950/#review89740
---


Thanks Sergio for this patch. Will this have negative impacts with the initial 
part? Thank you.


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(line 416)


Is that possible to use generic type to avoid creating DataWriter for each 
type since they are quite similar?


- cheng xu


On June 28, 2015, 8:29 a.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35950/
> ---
> 
> (Updated June 28, 2015, 8:29 a.m.)
> 
> 
> Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
> 
> 
> Bugs: HIVE-11131
> https://issues.apache.org/jira/browse/HIVE-11131
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Implemented data type writers that will be created before the first Hive row 
> is written to Parquet. These writers contain information about object 
> inspectors and schema of a specific data type, and calls the specific 
> add() method used by Parquet for each data type.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
>  c195c3ec3ddae19bf255fc2c9633f8bf4390f428 
> 
> Diff: https://reviews.apache.org/r/35950/diff/
> 
> 
> Testing
> ---
> 
> Tests from TestDataWritableWriter run OK.
> 
> I run other tests with micro-becnhmarks, and I got some better results from 
> this new implemntation:
> 
> Using repeated rows across the file, this is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 7.598 7.491   7.488   7.588   7.530.270 (before)
> 10.13711.511  10.155  10.297  10.242  0.286 (after)
> 
> Using random rows across the file, the is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 5.268 7.723   4.107   4.173   4.729   0.20   (before)
> 6.236 10.466  5.944   4.749   5.234   0.22   (after)
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



Re: Review Request 35950: HIVE-11131: Get row information on DataWritableWriter once for better writing performance

2015-06-29 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35950/#review89721
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(lines 71 - 72)


Add the comments before the declarations of messageWriter.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(line 73)


No need to initialized by null val.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(line 107)


I don't follow Why rename to schema here.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(line 182)


groupSchema -> groupType



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(line 352)


ByteDataWrter?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
(line 538)


DateDataWriter


- cheng xu


On June 28, 2015, 8:29 a.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35950/
> ---
> 
> (Updated June 28, 2015, 8:29 a.m.)
> 
> 
> Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
> 
> 
> Bugs: HIVE-11131
> https://issues.apache.org/jira/browse/HIVE-11131
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Implemented data type writers that will be created before the first Hive row 
> is written to Parquet. These writers contain information about object 
> inspectors and schema of a specific data type, and calls the specific 
> add() method used by Parquet for each data type.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
>  c195c3ec3ddae19bf255fc2c9633f8bf4390f428 
> 
> Diff: https://reviews.apache.org/r/35950/diff/
> 
> 
> Testing
> ---
> 
> Tests from TestDataWritableWriter run OK.
> 
> I run other tests with micro-becnhmarks, and I got some better results from 
> this new implemntation:
> 
> Using repeated rows across the file, this is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 7.598 7.491   7.488   7.588   7.530.270 (before)
> 10.13711.511  10.155  10.297  10.242  0.286 (after)
> 
> Using random rows across the file, the is the throughput increase using 1 
> million records:
> 
> bigintboolean double  float   int string
> 5.268 7.723   4.107   4.173   4.729   0.20   (before)
> 6.236 10.466  5.944   4.749   5.234   0.22   (after)
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>