[jira] [Created] (HIVE-19225) Class cast exception while running certain queries with UDAF like rank on internal struct columns

2018-04-16 Thread Amruth S (JIRA)
Amruth S created HIVE-19225:
---

 Summary: Class cast exception while running certain queries with 
UDAF like rank on internal struct columns
 Key: HIVE-19225
 URL: https://issues.apache.org/jira/browse/HIVE-19225
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.3.2
Reporter: Amruth S


Certain queries with rank function is causing class cast exception.
{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to 
org.apache.hadoop.hive.serde2.io.TimestampWritable
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:39)
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:25)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:412)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank.copyToStandardObject(GenericUDAFRank.java:219)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$GenericUDAFAbstractRankEvaluator.iterate(GenericUDAFRank.java:153)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:192)
at 
org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.processRow(WindowingTableFunction.java:407)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:236)
... 7 more

2018-03-29 09:28:43,432 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
cleanup for the task
{noformat}
The following changes fixes this.

The evaluator seem to skip the case where the primary obj emitted is struct. 
Modified the code to find the field inside struct
{code:java}
diff --git 
a/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
 
b/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
index 36a500790a..e7731e99d7 100644
--- 
a/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
+++ 
b/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
@@ -22,6 +22,7 @@
import java.util.Arrays;
import java.util.List;

+import org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@@ -171,6 +172,10 @@ public Object getStructFieldData(Object data, StructField 
fieldRef) {
// so we have to do differently.
boolean isArray = data.getClass().isArray();
if (!isArray && !(data instanceof List)) {
+ if (data instanceof LazyBinaryStruct
+ && fieldRef.getFieldObjectInspector().getCategory() == Category.PRIMITIVE) {
+ return ((LazyBinaryStruct) data).getField(((MyField) fieldRef).fieldID);
+ }
if (!warned) {
LOG.warn("Invalid type for struct " + data.getClass());
LOG.warn("ignoring similar errors.");
{code}
Let me know your thoughts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-16 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 17, 2018, 4:57 a.m.)


Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.


Changes
---

Implemented changes recommended by Matt and Jason.
Using BiFunction to select hashing method at compile time.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
 5dd0b8ea5b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
 ad14c7265f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 3733e3d02f 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
 03c28a33c8 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
353b890b7c 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties d26f0ccb17 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 69d9f3125a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
e15c5b7b66 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 7b1fd5f206 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 88b5ed81f1 
  ql/src/java/org/apache/hadoop/hive/ql/plan/OpTraits.java 9621c3be53 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java aa3c72bc6d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 25b91899de 
  ql/src/java/org/apache/hadoop/hive/ql/plan/VectorReduceSinkDesc.java 
adea3b53a9 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java 
7cd571815d 
  

Re: Review Request 66485: HIVE-19124 implement a basic major compactor for MM tables

2018-04-16 Thread Gopal V

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66485/#review201290
---




ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 352 (patched)


Table type?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 1119 (patched)


add comment about this (As the "file that adds this commit").



storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java
Lines 255 (patched)


Should probably return a new Object here (for sane debugging).


- Gopal V


On April 16, 2018, 10:35 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66485/
> ---
> 
> (Updated April 16, 2018, 10:35 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java a88453c978 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> b1c2288d01 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 22765b8e63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java fe0aaa4ff5 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  7b02865e18 
>   
> storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java
>  107ea9028a 
> 
> 
> Diff: https://reviews.apache.org/r/66485/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 66414: HIVE-17970 MM LOAD DATA with OVERWRITE doesn't use base_n directory concept

2018-04-16 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66414/#review201287
---




common/src/java/org/apache/hadoop/hive/common/JavaUtils.java
Line 200 (original), 200 (patched)


the trailing "_" is probably wrong.  Not all deltas have a suffix.  
delta_x_y is a valid name.



common/src/java/org/apache/hadoop/hive/common/JavaUtils.java
Line 219 (original), 218 (patched)


what is the purpose of this if block?  seems to do nothing useful



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 2239 (original), 2211 (patched)


obsolte todo



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 2241 (original), 2213 (patched)


is this todo still meaningfu?


- Eugene Koifman


On April 5, 2018, 12:24 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66414/
> ---
> 
> (Updated April 5, 2018, 12:24 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/JavaUtils.java 75c07b41b2 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  0168472bdc 
>   itests/src/test/resources/testconfiguration.properties d2e077b509 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7eba5e88d8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5fbe045df5 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java eed37a1937 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> e49089b91e 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java b0dfc48165 
>   ql/src/test/results/clientpositive/mm_loaddata.q.out  
> 
> 
> Diff: https://reviews.apache.org/r/66414/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-19224) incorrect token handling for LLAP plugin endpoint - part 2

2018-04-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19224:
---

 Summary: incorrect token handling for LLAP plugin endpoint - part 2
 Key: HIVE-19224
 URL: https://issues.apache.org/jira/browse/HIVE-19224
 Project: Hive
  Issue Type: Bug
Reporter: Aswathy Chellammal Sreekumar
Assignee: Sergey Shelukhin
 Fix For: 3.0.0


{noformat}
java.lang.IllegalArgumentException: Null user
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2207) 
~[guava-19.0.jar:?]
at com.google.common.cache.LocalCache.get(LocalCache.java:3953) 
~[guava-19.0.jar:?]
at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4790) 
~[guava-19.0.jar:?]
at 
org.apache.hadoop.hive.llap.AsyncPbRpcProxy.getProxy(AsyncPbRpcProxy.java:425) 
~[hive-exec-3.0.0.3.0.0.0-1101.jar:3.0.0.3.0.0.0-1101]
at 
org.apache.hadoop.hive.ql.exec.tez.LlapPluginEndpointClientImpl.access$000(LlapPluginEndpointClientImpl.java:45)
 ~[hive-exec-3.0.0.3.0.0.0-1101.jar:3.0.0.3.0.0.0-1101]
at 
org.apache.hadoop.hive.ql.exec.tez.LlapPluginEndpointClientImpl$SendUpdateQueryCallable.call(LlapPluginEndpointClientImpl.java:116)
 ~[hive-exec-3.0.0.3.0.0.0-1101.jar:3.0.0.3.0.0.0-1101]
at 
org.apache.hadoop.hive.ql.exec.tez.LlapPluginEndpointClientImpl$SendUpdateQueryCallable.call(LlapPluginEndpointClientImpl.java:93)
 ~[hive-exec-3.0.0.3.0.0.0-1101.jar:3.0.0.3.0.0.0-1101]
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
 [guava-19.0.jar:?]
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
 [guava-19.0.jar:?]
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
 [guava-19.0.jar:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning

2018-04-16 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66645/
---

(Updated April 16, 2018, 10:45 p.m.)


Review request for hive, Ashutosh Chauhan and Eugene Koifman.


Changes
---

Removed HIVE-19214 changes.


Bugs: HIVE-19211
https://issues.apache.org/jira/browse/HIVE-19211


Repository: hive-git


Description
---

HIVE-19211: New streaming ingest API and support for dynamic partitioning


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e533ee6 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java 
PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java 
a66c135 
  serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION 
  streaming/pom.xml b58ec01 
  streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java 
25998ae 
  streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb 
  streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java 
898b3f9 
  streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 
  streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 
  streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java 
PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java 23e17e7 
  streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 
  streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 
  streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d 
  streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java 
PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java 
762f5f8 
  streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java 
5f9aca6 
  streaming/src/java/org/apache/hive/streaming/PartitionHandler.java 
PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/QueryFailedException.java 
ccd3ae0 
  streaming/src/java/org/apache/hive/streaming/RecordWriter.java dc6d70e 
  streaming/src/java/org/apache/hive/streaming/SerializationError.java a57ba00 
  streaming/src/java/org/apache/hive/streaming/StreamingConnection.java 2f760ea 
  streaming/src/java/org/apache/hive/streaming/StreamingException.java a7f84c1 
  streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java 0dfbfa7 
  streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java 
PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 0077913 
  streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java c0b7324 
  streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 2b05771 
  streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java 
a8c8cd4 
  streaming/src/java/org/apache/hive/streaming/TransactionError.java a331b20 
  streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java 
f0843a1 
  streaming/src/test/org/apache/hive/streaming/TestStreaming.java 6f63bfb 
  
streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66645/diff/2/

Changes: https://reviews.apache.org/r/66645/diff/1-2/


Testing
---


Thanks,

Prasanth_J



Re: Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning

2018-04-16 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66645/#review201274
---




ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
Lines 297 (patched)


This is covered HIVE-19214. Will be removed after HIVE-19214 is committed.


- Prasanth_J


On April 16, 2018, 10:42 p.m., Prasanth_J wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66645/
> ---
> 
> (Updated April 16, 2018, 10:42 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Eugene Koifman.
> 
> 
> Bugs: HIVE-19211
> https://issues.apache.org/jira/browse/HIVE-19211
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-19211: New streaming ingest API and support for dynamic partitioning
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e533ee627fd38e8ddcdcedf00f0c2d3ae150c530 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java 
> PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java 
> a66c13507abef42977dfdb315ff7d69404f67ac3 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> d850062377d182e33a6191268d50d0008d7c77de 
>   serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION 
>   streaming/pom.xml b58ec015aa69e29aacdc0a165ead9439ea2e4b26 
>   streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java 
> 25998ae31a3a829aab45f9e526aa03d94feff5e0 
>   streaming/src/java/org/apache/hive/streaming/ConnectionError.java 
> 668bffb1ab17558dec33d599bddd6e28a06b3c5a 
>   streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java 
> 898b3f9bb1d1c483cae8c1dd4f2338fc453d514b 
>   streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java 
> b1f9520814d260a3d2df23e6050e72d803874da9 
>   streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java 
> b04e13784485ca097153bbec86f80d22e15e5cdc 
>   streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java 
> 23e17e76237036d8f9419bef2255f4f82c5b18a1 
>   streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 
> 0011b1454f8815816be931bf67cc13e7e78c9c0d 
>   streaming/src/java/org/apache/hive/streaming/InvalidPartition.java 
> f1f980430f3aceeb044bb549cc1a37a33c144750 
>   streaming/src/java/org/apache/hive/streaming/InvalidTable.java 
> ef1c91dbeb84b325b019318122fdd1f45b927414 
>   streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java 
> 762f5f86fc0df4a59cb54812a5dc79c1e2bc9489 
>   streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java 
> 5f9aca66ea0f2a7b2c3d2f6fb805fa1760b69e44 
>   streaming/src/java/org/apache/hive/streaming/QueryFailedException.java 
> ccd3ae0c98ea6ced0290f1ab027ad6337453fca2 
>   streaming/src/java/org/apache/hive/streaming/RecordWriter.java 
> dc6d70e92438e037d764099c82f5f654d5f5d801 
>   streaming/src/java/org/apache/hive/streaming/SerializationError.java 
> a57ba00ba401283aedd3f685171ef6bd810b11cd 
>   streaming/src/java/org/apache/hive/streaming/StreamingConnection.java 
> 2f760ea86eecbbc96db08509405a369abf7d89d5 
>   streaming/src/java/org/apache/hive/streaming/StreamingException.java 
> a7f84c14f30f2e4753bd99b3d2d1dcb236b0197b 
>   streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java 
> 0dfbfa71c50215d8f3e25298c8d11634a3cbedc4 
>   
> streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 
> 0077913cd1f0afbafe4608c4378398f61e254424 
>   streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java 
> c0b732482d35305ceaba1adfff09659e193ab098 
>   streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 
> 2b057718f58dec6de3e2b329a43bb5a06ce7c9ed 
>   
> streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java 
> a8c8cd48726421003df186fa1e0c2ecd18bdd5b4 
>   streaming/src/java/org/apache/hive/streaming/TransactionError.java 
> a331b20463e8328148fb08d85cf3ce77a7463062 
>   streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java 
> f0843a1748d956ea99dd4807cf0b4ffbe0ef9cba 
>   streaming/src/test/org/apache/hive/streaming/TestStreaming.java 
> 6f63bfb43e5dbe4c9529dfc80787a95ba6524c01 
>   
> streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66645/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,

Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning

2018-04-16 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66645/
---

Review request for hive, Ashutosh Chauhan and Eugene Koifman.


Bugs: HIVE-19211
https://issues.apache.org/jira/browse/HIVE-19211


Repository: hive-git


Description
---

HIVE-19211: New streaming ingest API and support for dynamic partitioning


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
e533ee627fd38e8ddcdcedf00f0c2d3ae150c530 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java 
PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java 
a66c13507abef42977dfdb315ff7d69404f67ac3 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
d850062377d182e33a6191268d50d0008d7c77de 
  serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION 
  streaming/pom.xml b58ec015aa69e29aacdc0a165ead9439ea2e4b26 
  streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java 
25998ae31a3a829aab45f9e526aa03d94feff5e0 
  streaming/src/java/org/apache/hive/streaming/ConnectionError.java 
668bffb1ab17558dec33d599bddd6e28a06b3c5a 
  streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java 
898b3f9bb1d1c483cae8c1dd4f2338fc453d514b 
  streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java 
b1f9520814d260a3d2df23e6050e72d803874da9 
  streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java 
b04e13784485ca097153bbec86f80d22e15e5cdc 
  streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java 
PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java 
23e17e76237036d8f9419bef2255f4f82c5b18a1 
  streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 
0011b1454f8815816be931bf67cc13e7e78c9c0d 
  streaming/src/java/org/apache/hive/streaming/InvalidPartition.java 
f1f980430f3aceeb044bb549cc1a37a33c144750 
  streaming/src/java/org/apache/hive/streaming/InvalidTable.java 
ef1c91dbeb84b325b019318122fdd1f45b927414 
  streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java 
762f5f86fc0df4a59cb54812a5dc79c1e2bc9489 
  streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java 
5f9aca66ea0f2a7b2c3d2f6fb805fa1760b69e44 
  streaming/src/java/org/apache/hive/streaming/QueryFailedException.java 
ccd3ae0c98ea6ced0290f1ab027ad6337453fca2 
  streaming/src/java/org/apache/hive/streaming/RecordWriter.java 
dc6d70e92438e037d764099c82f5f654d5f5d801 
  streaming/src/java/org/apache/hive/streaming/SerializationError.java 
a57ba00ba401283aedd3f685171ef6bd810b11cd 
  streaming/src/java/org/apache/hive/streaming/StreamingConnection.java 
2f760ea86eecbbc96db08509405a369abf7d89d5 
  streaming/src/java/org/apache/hive/streaming/StreamingException.java 
a7f84c14f30f2e4753bd99b3d2d1dcb236b0197b 
  streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java 
0dfbfa71c50215d8f3e25298c8d11634a3cbedc4 
  streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java 
PRE-CREATION 
  streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 
0077913cd1f0afbafe4608c4378398f61e254424 
  streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java 
c0b732482d35305ceaba1adfff09659e193ab098 
  streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 
2b057718f58dec6de3e2b329a43bb5a06ce7c9ed 
  streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java 
a8c8cd48726421003df186fa1e0c2ecd18bdd5b4 
  streaming/src/java/org/apache/hive/streaming/TransactionError.java 
a331b20463e8328148fb08d85cf3ce77a7463062 
  streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java 
f0843a1748d956ea99dd4807cf0b4ffbe0ef9cba 
  streaming/src/test/org/apache/hive/streaming/TestStreaming.java 
6f63bfb43e5dbe4c9529dfc80787a95ba6524c01 
  
streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66645/diff/1/


Testing
---


Thanks,

Prasanth_J



Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-16 Thread Deepak Jaiswal


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 813 (patched)
> > 
> >
> > For these primitive types, might make sense to pre-allocate fixed size 
> > ByteBuffers of size 2/4/8 which can be used here rather than having to 
> > allocate new ones for every value.
> 
> Deepak Jaiswal wrote:
> That is how I did it before but it would send a byte array of length 8 
> all the time. The murmur function would consider all 8 bytes to generate 
> hash. When I noticed it was creating different hashes for same key, I found 
> the bug, hence the specific size allocation. Also, it wont affect the 
> efficiency.
> 
> Jason Dere wrote:
> What I mean is this is performing an allocation for every call to 
> hashCode() here, which I think could affect the efficiency. This could be 
> avoided by passing in pre-allocated arrays of each size to this method. Also, 
> could you use the other version of hash32() where you can also pass in the 
> array length - that way you could just use the same array of size 8, but pass 
> in length 2/4/8 depending on which type you are hashing.

Aah. Got it. I wrote the method which takes buffer and its length for other 
purpose and forgot it could be used here.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 858 (patched)
> > 
> >
> > Old impl (based on DateWritable.hashCode()) did hashCode based on 
> > daysSinceEpoc value, will be faster than doing toString()
> 
> Deepak Jaiswal wrote:
> The new one converts it into string format to get bytes array. Are you 
> suggesting what we get from getPrimitiveWritableObject is daysSinceEpoc? And 
> since it is integer, it is faster to convert it into byte array directly 
> rethet than doing "toString"?
> 
> Jason Dere wrote:
> Yes, DateWritable.toString() converts to Date, which then has to call 
> toString() which means date conversion/formatting. Simpler to base it on the 
> int value.

Thanks for confirming.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
>  7c2cadefa7 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> 

Re: Review Request 66485: HIVE-19124 implement a basic major compactor for MM tables

2018-04-16 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66485/
---

(Updated April 16, 2018, 10:35 p.m.)


Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java a88453c978 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
b1c2288d01 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 22765b8e63 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java fe0aaa4ff5 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
 7b02865e18 
  
storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java 
107ea9028a 


Diff: https://reviews.apache.org/r/66485/diff/3/

Changes: https://reviews.apache.org/r/66485/diff/2-3/


Testing
---


Thanks,

Sergey Shelukhin



Re: Review Request 66503: HIVE-19126: CachedStore: Use memory estimation to limit cache size during prewarm

2018-04-16 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66503/
---

(Updated April 16, 2018, 9:42 p.m.)


Review request for hive and Thejas Nair.


Bugs: HIVE-19126
https://issues.apache.org/jira/browse/HIVE-19126


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-19126


Diffs (updated)
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/IncrementalObjectSizeEstimator.java
 6f4ec6f1ea 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
 2f7fa24558 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
 0bbaf7e459 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 1ce86bbdba 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
 89b400697b 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
 f007261daf 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/SizeValidator.java
 PRE-CREATION 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
 d451f966b0 


Diff: https://reviews.apache.org/r/66503/diff/6/

Changes: https://reviews.apache.org/r/66503/diff/5-6/


Testing
---


Thanks,

Vaibhav Gumashta



[jira] [Created] (HIVE-19223) Migrate negative test cases to use hive.cli.errors.ignore

2018-04-16 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-19223:
---

 Summary: Migrate negative test cases to use hive.cli.errors.ignore 
 Key: HIVE-19223
 URL: https://issues.apache.org/jira/browse/HIVE-19223
 Project: Hive
  Issue Type: Improvement
  Components: Test
Affects Versions: 3.0.0
Reporter: Aihua Xu


Migrate the negative test cases to use hive.cli.errors.ignore properties so 
multiple negative tests can be grouped together. It will save test resources 
and execution times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66514: HIVE-17645 MM tables patch conflicts with HIVE-17482 (Spark/Acid integration)

2018-04-16 Thread Jason Dere


> On April 16, 2018, 7:45 p.m., Sergey Shelukhin wrote:
> > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java
> > Lines 331 (patched)
> > 
> >
> > is this related?

I threw that in, since this patch (plus this fix) also fixed 
TestAcidOnTez#testGetSplitsLocks


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66514/#review201250
---


On April 11, 2018, 7:58 p.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66514/
> ---
> 
> (Updated April 11, 2018, 7:58 p.m.)
> 
> 
> Review request for hive, Eugene Koifman and Sergey Shelukhin.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Replace usage of SessionState.getTxnMgr() from several places, by doing some 
> refactoring to make the TxnManager available in fields passed in during 
> construction/initialization:
> - SemanticAnalyzer.genFileSinkPlan()
> - ReplicationSemanticAnalyzer.analyzeReplLoad()
> - LoadSemanticAnalyzer.analyzeExternal()
> - ImportSemanticAnalyzer.prepareImport()
> - DDLSemanticAnalyzer.handleTransactionalTable()
> 
> 
> Diffs
> -
> 
>   
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java 
> 3aec46be51 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java bda2af3a04 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java a8d851fd81 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/ReplLoadTask.java 
> 6b333d7184 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadConstraint.java
>  60c85f58e5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadFunction.java
>  bc7d0ad0b9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
>  06adc64727 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
>  1395027159 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/Context.java
>  bb51f36a25 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7a7bdea89d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> f38b0bc546 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
> 8b639f7922 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> e49089b91e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/MaterializedViewRebuildSemanticAnalyzer.java
>  e5af95b121 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 79b2e48ee2 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 10982ddbd1 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/MessageHandler.java
>  3ccd639d62 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/TableHandler.java
>  4cd75d8128 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 6003ced27e 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 
> fe570f0f8e 
> 
> 
> Diff: https://reviews.apache.org/r/66514/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Dere
> 
>



Re: Review Request 66514: HIVE-17645 MM tables patch conflicts with HIVE-17482 (Spark/Acid integration)

2018-04-16 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66514/#review201250
---




llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java
Lines 331 (patched)


is this related?


- Sergey Shelukhin


On April 11, 2018, 7:58 p.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66514/
> ---
> 
> (Updated April 11, 2018, 7:58 p.m.)
> 
> 
> Review request for hive, Eugene Koifman and Sergey Shelukhin.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Replace usage of SessionState.getTxnMgr() from several places, by doing some 
> refactoring to make the TxnManager available in fields passed in during 
> construction/initialization:
> - SemanticAnalyzer.genFileSinkPlan()
> - ReplicationSemanticAnalyzer.analyzeReplLoad()
> - LoadSemanticAnalyzer.analyzeExternal()
> - ImportSemanticAnalyzer.prepareImport()
> - DDLSemanticAnalyzer.handleTransactionalTable()
> 
> 
> Diffs
> -
> 
>   
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java 
> 3aec46be51 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java bda2af3a04 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java a8d851fd81 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/ReplLoadTask.java 
> 6b333d7184 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadConstraint.java
>  60c85f58e5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadFunction.java
>  bc7d0ad0b9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
>  06adc64727 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
>  1395027159 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/Context.java
>  bb51f36a25 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7a7bdea89d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> f38b0bc546 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
> 8b639f7922 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> e49089b91e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/MaterializedViewRebuildSemanticAnalyzer.java
>  e5af95b121 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 79b2e48ee2 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 10982ddbd1 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/MessageHandler.java
>  3ccd639d62 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/TableHandler.java
>  4cd75d8128 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 6003ced27e 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 
> fe570f0f8e 
> 
> 
> Diff: https://reviews.apache.org/r/66514/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Dere
> 
>



Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-16 Thread Jason Dere


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 813 (patched)
> > 
> >
> > For these primitive types, might make sense to pre-allocate fixed size 
> > ByteBuffers of size 2/4/8 which can be used here rather than having to 
> > allocate new ones for every value.
> 
> Deepak Jaiswal wrote:
> That is how I did it before but it would send a byte array of length 8 
> all the time. The murmur function would consider all 8 bytes to generate 
> hash. When I noticed it was creating different hashes for same key, I found 
> the bug, hence the specific size allocation. Also, it wont affect the 
> efficiency.

What I mean is this is performing an allocation for every call to hashCode() 
here, which I think could affect the efficiency. This could be avoided by 
passing in pre-allocated arrays of each size to this method. Also, could you 
use the other version of hash32() where you can also pass in the array length - 
that way you could just use the same array of size 8, but pass in length 2/4/8 
depending on which type you are hashing.


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
> > Lines 858 (patched)
> > 
> >
> > Old impl (based on DateWritable.hashCode()) did hashCode based on 
> > daysSinceEpoc value, will be faster than doing toString()
> 
> Deepak Jaiswal wrote:
> The new one converts it into string format to get bytes array. Are you 
> suggesting what we get from getPrimitiveWritableObject is daysSinceEpoc? And 
> since it is integer, it is faster to convert it into byte array directly 
> rethet than doing "toString"?

Yes, DateWritable.toString() converts to Date, which then has to call 
toString() which means date conversion/formatting. Simpler to base it on the 
int value.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
>  7c2cadefa7 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> 

[jira] [Created] (HIVE-19222) TestNegativeCliDriver tests are failing due to "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2018-04-16 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-19222:
---

 Summary: TestNegativeCliDriver tests are failing due to 
"java.lang.OutOfMemoryError: GC overhead limit exceeded"
 Key: HIVE-19222
 URL: https://issues.apache.org/jira/browse/HIVE-19222
 Project: Hive
  Issue Type: Sub-task
Reporter: Aihua Xu


TestNegativeCliDriver tests are failing with OOM recently. Not sure why. I will 
try to increase the memory to test out.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #334: HIVE-19219: Hive replicated database is out of sync ...

2018-04-16 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/334

HIVE-19219: Hive replicated database is out of sync if events are 
cleaned-up.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-19219

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #334


commit b883f8b28c9f86e0cc731bc1b43c3706617bec81
Author: Sankar Hariappan 
Date:   2018-04-16T16:05:46Z

HIVE-19219: Hive replicated database is out of sync if events are 
cleaned-up.




---


[jira] [Created] (HIVE-19221) DESC FORMATTED shows incorrect results in table statistics for full ACID table after multi table inserts

2018-04-16 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19221:
-

 Summary: DESC FORMATTED shows incorrect results in table 
statistics for full ACID table after multi table inserts
 Key: HIVE-19221
 URL: https://issues.apache.org/jira/browse/HIVE-19221
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 3.0.0
Reporter: Steve Yeom


"DESC FORMATTED acid_table" seems to show incorrect table row count and 
COLUMN_STATS_ACCURATE item values after multi table inserts.

I will add the test case shortly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19220) perfLogger in Driver need to combine

2018-04-16 Thread shengsiwei (JIRA)
shengsiwei created HIVE-19220:
-

 Summary: perfLogger in Driver need to combine
 Key: HIVE-19220
 URL: https://issues.apache.org/jira/browse/HIVE-19220
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.1.0
Reporter: shengsiwei
Assignee: shengsiwei






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19219) Hive replicated database is out of sync if events are cleaned-up.

2018-04-16 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-19219:
---

 Summary: Hive replicated database is out of sync if events are 
cleaned-up.
 Key: HIVE-19219
 URL: https://issues.apache.org/jira/browse/HIVE-19219
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan
 Fix For: 3.1.0


This is the case where the events were deleted on source because of old event 
purging and hence min(source event id) > target event id (last replicated event 
id).

Repl dump should fail in this case so that user can drop the database and 
bootstrap again.

The next incremental repl dump could check if the last completed event (passed 
as the fromEventId argument), is still present in source notification_log 
table. If it is not present, it should error out saying that events are missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)