[jira] [Created] (HIVE-16988) if partition column type is boolean, Streaming api AbstractRecordWriter.getPathForEndPoint NoSuchObjectException: partition values=[Y, 2017-06-29 14:32:36.508]

2017-06-29 Thread brucewoo (JIRA)
brucewoo created HIVE-16988:
---

 Summary: if partition column type is boolean, Streaming api 
AbstractRecordWriter.getPathForEndPoint NoSuchObjectException: partition 
values=[Y, 2017-06-29 14:32:36.508]
 Key: HIVE-16988
 URL: https://issues.apache.org/jira/browse/HIVE-16988
 Project: Hive
  Issue Type: Bug
Reporter: brucewoo


org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to 
EndPoint {metaStoreUri='thrift://localhost:9083', database='dw_subject', 
table='alls', partitionVals=[Y, 2017-06-29 14:32:36.508] }
at org.apache.nifi.util.hive.HiveWriter.(HiveWriter.java:53) 
~[nifi-hive-processors-1.1.2.jar:1.1.2]
at 
org.apache.nifi.processors.hive.InsertTdfHive2.getOrCreateWriter(InsertTdfHive2.java:971)
 [nifi-hive-processors-1.1.2.jar:1.1.2]
at 
org.apache.nifi.processors.hive.InsertTdfHive2.putStreamingHive(InsertTdfHive2.java:872)
 [nifi-hive-processors-1.1.2.jar:1.1.2]
at 
org.apache.nifi.processors.hive.InsertTdfHive2.onTrigger(InsertTdfHive2.java:411)
 [nifi-hive-processors-1.1.2.jar:1.1.2]
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 [nifi-api-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:1)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_131]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_131]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_131]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.hive.hcatalog.streaming.StreamingException: partition 
values=[Y, 2017-06-29 14:32:36.508]. Unable to get path for end point: [Y, 
2017-06-29 14:32:36.508]
at 
org.apache.hive.hcatalog.streaming.AbstractRecordWriter.getPathForEndPoint(AbstractRecordWriter.java:268)
 ~[hive-hcatalog-streaming-2.0.0.jar:2.0.0]
at 
org.apache.hive.hcatalog.streaming.AbstractRecordWriter.(AbstractRecordWriter.java:79)
 ~[hive-hcatalog-streaming-2.0.0.jar:2.0.0]
at 
org.apache.hive.hcatalog.streaming.DelimitedInputWriter.(DelimitedInputWriter.java:121)
 ~[hive-hcatalog-streaming-2.0.0.jar:2.0.0]
at 
org.apache.hive.hcatalog.streaming.DelimitedInputWriter.(DelimitedInputWriter.java:98)
 ~[hive-hcatalog-streaming-2.0.0.jar:2.0.0]
at 
org.apache.hive.hcatalog.streaming.DelimitedInputWriter.(DelimitedInputWriter.java:79)
 ~[hive-hcatalog-streaming-2.0.0.jar:2.0.0]
at 
org.apache.nifi.util.hive.HiveWriter.getDelimitedInputWriter(HiveWriter.java:60)
 ~[nifi-hive-processors-1.1.2.jar:1.1.2]
at org.apache.nifi.util.hive.HiveWriter.(HiveWriter.java:46) 
~[nifi-hive-processors-1.1.2.jar:1.1.2]
... 15 common frames omitted
Caused by: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
partition values=[Y, 2017-06-29 14:32:36.508]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_result$get_partition_resultStandardScheme.read(ThriftHiveMetastore.java)
 ~[hive-exec-2.0.0.jar:2.0.0]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_result$get_partition_resultStandardScheme.read(ThriftHiveMetastore.java)
 ~[hive-exec-2.0.0.jar:2.0.0]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_result.read(ThriftHiveMetastore.java)
 ~[hive-exec-2.0.0.jar:2.0.0]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) 
~[libthrift-0.9.3.jar:0.9.3]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:1924)
 ~[hive-exec-2.0.0.jar:2.0.0]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:1909)
 ~[hive-exec-2.0.0.jar:2.0.0]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaSt

[jira] [Created] (HIVE-16989) Fix some issues identified by lgtm.com

2017-06-29 Thread Malcolm Taylor (JIRA)
Malcolm Taylor created HIVE-16989:
-

 Summary: Fix some issues identified by lgtm.com
 Key: HIVE-16989
 URL: https://issues.apache.org/jira/browse/HIVE-16989
 Project: Hive
  Issue Type: Improvement
Reporter: Malcolm Taylor
Assignee: Malcolm Taylor


[lgtm.com|https://lgtm.com] has identified a number of issues where there may 
be scope for improvement. The plan is to address some of the alerts found at 
[https://lgtm.com/projects/g/apache/hive/].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-06-29 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-16990:
---

 Summary: REPL LOAD should update last repl ID only after 
successful copy of data files.
 Key: HIVE-16990
 URL: https://issues.apache.org/jira/browse/HIVE-16990
 Project: Hive
  Issue Type: Sub-task
  Components: Hive, repl
Affects Versions: 2.1.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan
 Fix For: 3.0.0


For REPL LOAD operations that includes both metadata and data changes should 
follow the below rule.
1. Copy the metadata excluding the last repl ID.
2. Copy the data files
3. If Step 1 and 2 are successful, then update the last repl ID of the object.
This rule will allow the the failed events to be re-applied by REPL LOAD and 
ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] hive pull request #194: HIVE-16785: Ensure replication actions are idempoten...

2017-06-29 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/194


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-16991:
-

 Summary: HiveMetaStoreClient needs a 2-arg constructor for 
backwards compatibility
 Key: HIVE-16991
 URL: https://issues.apache.org/jira/browse/HIVE-16991
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


Some client code that is not easy to change uses a 2-arg constructor on 
HiveMetaStoreClient.
It is trivial and safe to add this constructor:

{{public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
throws MetaException {
this(conf, hookLoader, true);
}}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16992) LLAP: better default lambda for LRFU policy

2017-06-29 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-16992:
---

 Summary: LLAP: better default lambda for LRFU policy
 Key: HIVE-16992
 URL: https://issues.apache.org/jira/browse/HIVE-16992
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


LRFU is currently skewed heavily towards LRU; there are 10k-s or 100k-s of 
buffers tracked during a typical workload, but the heap size is around 700. We 
should see if making it closer to LFU (by tweaking the lambda) will improve hit 
rate with small queries infrequently interleaved with large scans; and whether 
it will have negative effects due to perf overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 60289: HIVE-15665 LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-29 Thread Sergey Shelukhin


> On June 29, 2017, 6:38 a.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 2974 (original)
> > 
> >
> > Do we need this for offheap cache? For smaller cache, we don't want 
> > metadata objects taking up 100% of cache. As long as we are storing only 
> > the serialized footers, this shouldn't be a problem.

I think it's ok to have metadata taking up all of the cache. If anything it 
might be better to have all metadata and less of the data...


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60289/#review179152
---


On June 21, 2017, 8:31 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60289/
> ---
> 
> (Updated June 21, 2017, 8:31 p.m.)
> 
> 
> Review request for hive, Gopal V and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java be38f381e6 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java
>  c73f1a1a7d 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java 
> 53c9bae5c1 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
>  2a76f5c4da 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
>  dc053ee7cf 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
>  b9d7a77d5b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
>  601b622b49 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
>  4565d11988 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
>  13c7767a3b 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
>  03a955c6f7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 0ef7c758d4 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 
> 7540e72b53 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> d5807b77e2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/Reader.java 31b0609b83 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/ReaderImpl.java 
> 4856fb3ceb 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 8af84dce19 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 4536cbbfb9 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out cd7a392e08 
>   ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
> b799527e30 
> 
> 
> Diff: https://reviews.apache.org/r/60289/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Dan Burkert (JIRA)
Dan Burkert created HIVE-16993:
--

 Summary: ThriftHiveMetastore.create_database can fail if the 
locationUri is not set
 Key: HIVE-16993
 URL: https://issues.apache.org/jira/browse/HIVE-16993
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Dan Burkert


Calling 
[{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
 with a database with an unset {{locationUri}} field through the C++ 
implementation fails with:

{code}
MetaException(message=java.lang.IllegalArgumentException: Can not create a Path 
from an empty string)
{code}

The 
[{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
 Thrift field is 'default requiredness (implicit)', and Thrift [does not 
specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
whether unset default requiredness fields are encoded.  Empirically, the Java 
generated code [does not write the 
{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
 when the field is unset, while the C++ generated code 
[does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].

The MetaStore treats the field as optional, and [fills in a default 
value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
 if the field is unset.

The end result is that when the C++ implementation sends a {{Database}} without 
the field set, it actually writes an empty string, and the MetaStore treats it 
as a set field (non-null), and then calls a {{Path}} API which rejects the 
empty string.  The fix is simple: make the {{locationUri}} field optional in 
metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 60289: HIVE-15665 LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-29 Thread Sergey Shelukhin


> On June 29, 2017, 6:38 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java
> > Lines 1685 (patched)
> > 
> >
> > This looks complicated. It will be better if ORC can provide API that 
> > returns stripe footer and indexes as ByteBuffer which can be directly 
> > cached. Stripe footers and Indexes could be stored with medium priority. 
> > 
> > Priorities could be:
> > Serialized file footer - HIGH (this is required to not choke NN, with 
> > config change this is already part of split)
> > Index + Stripe footer - MEDIUM (with locality re-reading these will not 
> > be a problem)
> > Data - LOW (same as reading index, stripe footer)
> > 
> > Since backward seeks no longer close connections for cloud storage, 
> > reading index and stripe could be done faster. 
> > 
> > I think it would be easier,
> > if ORC and parquet readers can provide 2 high level interfaces
> > - Interface to read footers, index as ByteBuffers which LLAP will cache
> > - Reader interface to accept ByteBuffer from which footers and index 
> > can be read which LLAP or file will provide

I filed some ORC JIRAs to improve APIs... for now a no-op


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60289/#review179152
---


On June 21, 2017, 8:31 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60289/
> ---
> 
> (Updated June 21, 2017, 8:31 p.m.)
> 
> 
> Review request for hive, Gopal V and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java be38f381e6 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java
>  c73f1a1a7d 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java 
> 53c9bae5c1 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
>  2a76f5c4da 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
>  dc053ee7cf 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
>  b9d7a77d5b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
>  601b622b49 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
>  4565d11988 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
>  13c7767a3b 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
>  03a955c6f7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 0ef7c758d4 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 
> 7540e72b53 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> d5807b77e2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/Reader.java 31b0609b83 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/ReaderImpl.java 
> 4856fb3ceb 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 8af84dce19 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 4536cbbfb9 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out cd7a392e08 
>   ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
> b799527e30 
> 
> 
> Diff: https://reviews.apache.org/r/60289/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 60289: HIVE-15665 LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-29 Thread Sergey Shelukhin


> On June 29, 2017, 6:38 a.m., Prasanth_J wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
> > Lines 614 (patched)
> > 
> >
> > can this all be baked in OrcFileMetadata class? since stats, stripes 
> > are derived from tail, will be useful if we can just store OrcTail in 
> > OrcFileMetadata and lazily construct stats and stripes when getter is 
> > invoked.

why? I'm assuming the reader would always need those and they are already in 
the object


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60289/#review179152
---


On June 21, 2017, 8:31 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60289/
> ---
> 
> (Updated June 21, 2017, 8:31 p.m.)
> 
> 
> Review request for hive, Gopal V and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java be38f381e6 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java
>  c73f1a1a7d 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java 
> 53c9bae5c1 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
>  2a76f5c4da 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
>  dc053ee7cf 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
>  b9d7a77d5b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
>  601b622b49 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
>  4565d11988 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
>  13c7767a3b 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
>  03a955c6f7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 0ef7c758d4 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 
> 7540e72b53 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> d5807b77e2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/Reader.java 31b0609b83 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/ReaderImpl.java 
> 4856fb3ceb 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 8af84dce19 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 4536cbbfb9 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out cd7a392e08 
>   ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
> b799527e30 
> 
> 
> Diff: https://reviews.apache.org/r/60289/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 60289: HIVE-15665 LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-29 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60289/
---

(Updated June 29, 2017, 9:21 p.m.)


Review request for hive, Gopal V and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 34a663d45b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java 
0cbc8f6f4c 
  llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCacheImpl.java 
82ea0c0e5e 
  llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java 
53c9bae5c1 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
 2a76f5c4da 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
 dc053ee7cf 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
 b9d7a77d5b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
 601b622b49 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
 4565d11988 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
 13c7767a3b 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
 03a955c6f7 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 0ef7c758d4 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 
7540e72b53 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
d5807b77e2 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/Reader.java 31b0609b83 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/ReaderImpl.java 
4856fb3ceb 
  ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 8af84dce19 
  ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 4536cbbfb9 
  ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out cd7a392e08 
  ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
b799527e30 


Diff: https://reviews.apache.org/r/60289/diff/2/

Changes: https://reviews.apache.org/r/60289/diff/1-2/


Testing
---


Thanks,

Sergey Shelukhin



Re: Review Request 60445: HIVE-16935: Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-29 Thread Andrew Sherman


> On June 28, 2017, 10:39 a.m., Peter Vary wrote:
> > Hi,
> > 
> > Thanks for the change.
> > Overall looks good.
> > After running Yetus, it come up with the following warnings:
> > 
> > ./common/src/java/org/apache/hive/common/util/HiveStringUtils.java:1077:  
> > /**: warning: First sentence should end with a period.
> > ./common/src/java/org/apache/hive/common/util/HiveStringUtils.java:1088:
> > int[] startQuote = { -1 };:25: warning: '{' is followed by whitespace.
> > ./common/src/java/org/apache/hive/common/util/HiveStringUtils.java:1105:
> > if (line == null || line.isEmpty()): warning: 'if' construct must use '{}'s.
> > ./common/src/java/org/apache/hive/common/util/HiveStringUtils.java:1107:
> > if (startQuote[0] == -1 && isComment(line)): warning: 'if' construct must 
> > use '{}'s.
> > ./cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java:126:String 
> > cmd_trimmed = removeComments(cmd).trim();:12: warning: Name 'cmd_trimmed' 
> > must match pattern '^[a-z][a-zA-Z0-9]*$'.
> > 
> > I personally do not like to import methods:
> > https://stackoverflow.com/questions/420791/what-is-a-good-use-case-for-static-import-of-methods
> > 
> > Thanks for the patch,
> > Peter

Thanks for the review.
Thanks for letting me know about Yetus. I was able to follow the instructiona 
at https://cwiki.apache.org/confluence/display/Hive/Running+Yetus
I have fixed the checkstyle problems except for the name 'cmd_trimmed'. 
Changing this causes cascading checkstyle changes which expand the scope of the 
fix too far.
I reduced the usage of static imports but kept it in the test where it makes 
the test more readable.


> On June 28, 2017, 10:39 a.m., Peter Vary wrote:
> > common/src/java/org/apache/hive/common/util/HiveStringUtils.java
> > Lines 1104 (patched)
> > 
> >
> > nit: do we need this as public?

Yes, it is called from org.apache.hive.beeline.Commands


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60445/#review179088
---


On June 26, 2017, 8:35 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60445/
> ---
> 
> (Updated June 26, 2017, 8:35 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Bugs: HIVE-16935
> https://issues.apache.org/jira/browse/HIVE-16935
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> We strip sql comments from a command string. The stripped command is use to 
> determine which
> CommandProcessor will execute the command. If the CommandProcessorFactory 
> does not select a special
> CommandProcessor then we execute the original unstripped command so that the 
> sql parser can remove comments.
> Move BeeLine's comment stripping code to HiveStringUtils and change BeeLine 
> to call it from there
> Add a better test with separate tokens for "set role" in 
> TestCommandProcessorFactory.
> Add a test case for comment removal in set_processor_namespaces.q using an 
> indented comment as
> unindented comments are removed by the test driver.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/Commands.java 3b2d72ed79 
>   beeline/src/test/org/apache/hive/beeline/TestCommands.java 04c939a04c 
>   cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 27fd66d35e 
>   common/src/java/org/apache/hive/common/util/HiveStringUtils.java 4a6413a7c3 
>   common/src/test/org/apache/hive/common/util/TestHiveStringUtils.java 
> 6bd7037152 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  21bdcf4443 
>   ql/src/test/queries/clientpositive/set_processor_namespaces.q 612807f0c8 
>   ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
> c05ce4d61d 
>   
> service/src/java/org/apache/hive/service/cli/operation/ExecuteStatementOperation.java
>  2dd90b69b3 
> 
> 
> Diff: https://reviews.apache.org/r/60445/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test case.
> Hand tested with Hue and Jdbc.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



Re: Review Request 60445: HIVE-16935: Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-29 Thread Andrew Sherman


> On June 28, 2017, 5:18 a.m., Sahil Takiar wrote:
> > common/src/java/org/apache/hive/common/util/HiveStringUtils.java
> > Lines 1083 (patched)
> > 
> >
> > is this copied from somewhere, or is this logic new?

This code is new.


> On June 28, 2017, 5:18 a.m., Sahil Takiar wrote:
> > common/src/java/org/apache/hive/common/util/HiveStringUtils.java
> > Lines 1104 (patched)
> > 
> >
> > are there any changes to this method, or was it just moved from 
> > `Commands.java`

This code was moved. Peter requests updated javadoc so I will change that in  
the next revision.


> On June 28, 2017, 5:18 a.m., Sahil Takiar wrote:
> > common/src/java/org/apache/hive/common/util/HiveStringUtils.java
> > Lines 1136 (patched)
> > 
> >
> > new logic, or copied from beeline?

Moved from beeline


> On June 28, 2017, 5:18 a.m., Sahil Takiar wrote:
> > common/src/test/org/apache/hive/common/util/TestHiveStringUtils.java
> > Lines 113-115 (patched)
> > 
> >
> > nit: remove extra lines

Fixed in next patch


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60445/#review179073
---


On June 26, 2017, 8:35 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60445/
> ---
> 
> (Updated June 26, 2017, 8:35 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Bugs: HIVE-16935
> https://issues.apache.org/jira/browse/HIVE-16935
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> We strip sql comments from a command string. The stripped command is use to 
> determine which
> CommandProcessor will execute the command. If the CommandProcessorFactory 
> does not select a special
> CommandProcessor then we execute the original unstripped command so that the 
> sql parser can remove comments.
> Move BeeLine's comment stripping code to HiveStringUtils and change BeeLine 
> to call it from there
> Add a better test with separate tokens for "set role" in 
> TestCommandProcessorFactory.
> Add a test case for comment removal in set_processor_namespaces.q using an 
> indented comment as
> unindented comments are removed by the test driver.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/Commands.java 3b2d72ed79 
>   beeline/src/test/org/apache/hive/beeline/TestCommands.java 04c939a04c 
>   cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 27fd66d35e 
>   common/src/java/org/apache/hive/common/util/HiveStringUtils.java 4a6413a7c3 
>   common/src/test/org/apache/hive/common/util/TestHiveStringUtils.java 
> 6bd7037152 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  21bdcf4443 
>   ql/src/test/queries/clientpositive/set_processor_namespaces.q 612807f0c8 
>   ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
> c05ce4d61d 
>   
> service/src/java/org/apache/hive/service/cli/operation/ExecuteStatementOperation.java
>  2dd90b69b3 
> 
> 
> Diff: https://reviews.apache.org/r/60445/diff/1/
> 
> 
> Testing
> ---
> 
> Added new test case.
> Hand tested with Hue and Jdbc.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



Review Request 60552: hive.optimize.bucketingsorting should compare the schema before removing RS

2017-06-29 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60552/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

HIVE-16981


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java
 ac1c803b32 
  ql/src/test/queries/clientpositive/smb_mapjoin_20.q aa1e9fa9d8 
  ql/src/test/results/clientpositive/beeline/smb_mapjoin_12.q.out 9928a60095 
  ql/src/test/results/clientpositive/bucketsortoptimize_insert_8.q.out 
f3d30068ad 
  ql/src/test/results/clientpositive/smb_mapjoin_12.q.out 9928a60095 
  ql/src/test/results/clientpositive/smb_mapjoin_20.q.out 6c411716e7 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_20.q.out f35a33d8dc 


Diff: https://reviews.apache.org/r/60552/diff/1/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-16994) Support connection pooling for HiveMetaStoreClient

2017-06-29 Thread Alexander Kolbasov (JIRA)
Alexander Kolbasov created HIVE-16994:
-

 Summary: Support connection pooling for HiveMetaStoreClient
 Key: HIVE-16994
 URL: https://issues.apache.org/jira/browse/HIVE-16994
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Alexander Kolbasov


The native {{HiveMetaStoreClient}} doesn't support connection pooling. I think 
it would be a very useful feature, especially in Kerberos environments where 
connection establishment may be especially expensive. 

A similar feature is now supported in Sentry - see SENTRY-1580.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16995) Merge NDV across partitions using bit vectors

2017-06-29 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-16995:
--

 Summary: Merge NDV across partitions using bit vectors
 Key: HIVE-16995
 URL: https://issues.apache.org/jira/browse/HIVE-16995
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-06-29 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-16996:
--

 Summary: Add HLL as an alternative to FM sketch to compute stats
 Key: HIVE-16996
 URL: https://issues.apache.org/jira/browse/HIVE-16996
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16997) Extend object store to store bit vectors

2017-06-29 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-16997:
--

 Summary: Extend object store to store bit vectors
 Key: HIVE-16997
 URL: https://issues.apache.org/jira/browse/HIVE-16997
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 60445: HIVE-16935: Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-29 Thread Andrew Sherman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60445/
---

(Updated June 30, 2017, 12:30 a.m.)


Review request for hive and Sahil Takiar.


Bugs: HIVE-16935
https://issues.apache.org/jira/browse/HIVE-16935


Repository: hive-git


Description (updated)
---

We strip sql comments from a command string. The stripped command is use to 
determine which
CommandProcessor will execute the command. If the CommandProcessorFactory does 
not select a special
CommandProcessor then we execute the original unstripped command so that the 
sql parser can remove comments.
Move BeeLine's comment stripping code to HiveStringUtils and change BeeLine to 
call it from there
Add a better test with separate tokens for "set role" in 
TestCommandProcessorFactory.
Add a test case for comment removal in set_processor_namespaces.q  using an 
indented comment as
unindented comments are removed by the test driver.

Change-Id: I166dc1e7588ec9802ba373d88e69e716aecd33c2


Diffs (updated)
-

  beeline/src/java/org/apache/hive/beeline/Commands.java 
3b2d72ed79771e6198e62c47060a7f80665dbcb2 
  beeline/src/test/org/apache/hive/beeline/TestCommands.java 
04c939a04c7a56768286743c2bb9c9797507e3aa 
  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 
27fd66d35ea89b0de0d17763625fbf564584fcca 
  common/src/java/org/apache/hive/common/util/HiveStringUtils.java 
4a6413a7c376ffb4de6d20d24707ac5bf89ebc0c 
  common/src/test/org/apache/hive/common/util/TestHiveStringUtils.java 
6bd7037152c6f809daec8af42708693c05fe00cf 
  
ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
 21bdcf44436a02b11f878fa439e916d4b55ac63d 
  ql/src/test/queries/clientpositive/set_processor_namespaces.q 
612807f0c871b1881446d088e1c2c399d1afe970 
  ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
c05ce4d61d00a9ee6671d97f2fd178f18d44cc8c 
  
service/src/java/org/apache/hive/service/cli/operation/ExecuteStatementOperation.java
 2dd90b69b3bf789b1a3928129cf801b17884033f 


Diff: https://reviews.apache.org/r/60445/diff/2/

Changes: https://reviews.apache.org/r/60445/diff/1-2/


Testing
---

Added new test case.
Hand tested with Hue and Jdbc.


Thanks,

Andrew Sherman



[jira] [Created] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-06-29 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-16998:
---

 Summary: Add config to enable HoS DPP only for map-joins
 Key: HIVE-16998
 URL: https://issues.apache.org/jira/browse/HIVE-16998
 Project: Hive
  Issue Type: Sub-task
  Components: Logical Optimizer, Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


HoS DPP will split a given operator tree in two under the following conditions: 
it has detected that the query can benefit from DPP, and the filter is not a 
map-join (see SplitOpTreeForDPP).

This can hurt performance if the the non-partitioned side of the join involves 
a complex operator tree - e.g. the query {{select count(*) from srcpart where 
srcpart.ds in (select max(srcpart.ds) from srcpart union all select 
min(srcpart.ds) from srcpart)}} will require running the subquery twice, once 
in each Spark job.

Queries with map-joins don't get split into two operator trees and thus don't 
suffer from this drawback. Thus, it would be nice to have a config key that 
just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)
Sailee Jain created HIVE-16999:
--

 Summary: Performance bottleneck in the add_resource api
 Key: HIVE-16999
 URL: https://issues.apache.org/jira/browse/HIVE-16999
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Sailee Jain
Priority: Critical


Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
{{1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"}}
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-

{{public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.}}
{{  List {color:#d04437}resolveAndDownload{color}(ResourceType t, String 
value, boolean convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else {{color:#d04437} // goes here for HDFS{color}
  {color:#d04437}return Arrays.asList(createURI(downloadResource(value, 
convertToUnix)));{color} 
}
  }}}

Thanks,
Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17000) Upgrade Hive to PARQUET 1.9.0

2017-06-29 Thread Dapeng Sun (JIRA)
Dapeng Sun created HIVE-17000:
-

 Summary: Upgrade Hive to PARQUET 1.9.0
 Key: HIVE-17000
 URL: https://issues.apache.org/jira/browse/HIVE-17000
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 3.0.0
Reporter: Dapeng Sun
Assignee: Dapeng Sun


Parquet 1.9.0 is released and added many new features, such as PARQUET-601
Add support in Parquet to configure the encoding used by ValueWriters

We should upgrade Parquet dependence to 1.9.0 and bring these optimizations to 
Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)