date:20180523

[jira] [Created] (HIVE-19691) Start SessionState in materialized views registry

2018-05-23 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-19691:
--

 Summary: Start SessionState in materialized views registry
 Key: HIVE-19691
 URL: https://issues.apache.org/jira/browse/HIVE-19691
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


SessionState is not initialized when we load the materialized views, which 
leads to a NullPointerException and other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19690) multi-insert query with multiple GBY, and distinct in only some branches can produce incorrect results

2018-05-23 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19690:
---

 Summary: multi-insert query with multiple GBY, and distinct in 
only some branches can produce incorrect results
 Key: HIVE-19690
 URL: https://issues.apache.org/jira/browse/HIVE-19690
 Project: Hive
  Issue Type: Bug
Reporter: Riju Trivedi
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

How to do HMS and HDFS authentication

2018-05-23 Thread 侯宗田

Hello, everyone:
I used a hive metastore thrift client to access hive data. First, send a 
request to HMS to get the meta data, second, read/write data on HDFS. But, I am 
confused about the  authentication problem, the hive database I want to access 
is using LDAP to do the authentication, I have looked at how it is configured, 
but it is all about hiveserver2. So, Can I directly access the HMS and HDFS, or 
how can I properly do the authentication? Can anyone give some advice to me?

[jira] [Created] (HIVE-19689) Not able to do insert into table belonging to a non default namespace - HDFS federated cluster

2018-05-23 Thread Supreeth Sharma (JIRA)

Supreeth Sharma created HIVE-19689:
--

 Summary: Not able to do insert into table belonging to a non 
default namespace - HDFS federated cluster
 Key: HIVE-19689
 URL: https://issues.apache.org/jira/browse/HIVE-19689
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Supreeth Sharma


Not able to do insert into table belonging to a non default namespace in HDFS 
federated cluster.

Steps to reproduce :
1) Create a HDFS federated cluster with 2 namespaces
2) Create an external table belonging to non-default namespace.
{code:java}
CREATE EXTERNAL TABLE test_ext_tbl2 (id int, name string, dept string) 
PARTITIONED BY (year int) location 'hdfs://ns2/tmp/test_ext_tbl2'
{code}
3) Try to insert a row into the newly created table.
{code:java}
INSERT INTO test_ext_tbl2 PARTITION (year=2016) VALUES (8,'Henry','CSE');
{code}
The query is hung and after some time its failing with below error :
{code:java}
ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1527031638037_0017_1_00, diagnostics=[Task failed, 
taskId=task_1527031638037_0017_1_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1527031638037_0017_1_00_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing writable (null)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing writable (null)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing writable (null)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
DestHost:destPort ctr-e138-1518143905142-326063-01-06.hwx.site:8020 , 
LocalHost:localPort 
ctr-e138-1518143905142-326063-01-06.hwx.site/172.27.67.65:0. Failed on 
local exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:708)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:863)
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at

[jira] [Created] (HIVE-19688) Make catalogs updatable

2018-05-23 Thread Alan Gates (JIRA)

Alan Gates created HIVE-19688:
-

 Summary: Make catalogs updatable
 Key: HIVE-19688
 URL: https://issues.apache.org/jira/browse/HIVE-19688
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


The initial changes for catalogs did not include an ability to alter catalogs.  
We need to add that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19687) Export table on acid partitioned table is failing

2018-05-23 Thread Vineet Garg (JIRA)

Vineet Garg created HIVE-19687:
--

 Summary: Export table on acid partitioned table is failing
 Key: HIVE-19687
 URL: https://issues.apache.org/jira/browse/HIVE-19687
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Vineet Garg


*Reproducer*

{code:sql}
create table exportPartitionTable(id int, name string) partitioned by(country 
string) clustered by (id) into 2 buckets  stored as orc tblproperties 
("transactional"="true");
export table exportPartitionTable PARTITION (country='india') to 
'/tmp/exportDataStore';
{code}

*Error*
{noformat}
FAILED: SemanticException [Error 10004]: Line 1:165 Invalid table alias or 
column reference 'india': (possible column names are: id, name, country)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19686) schematool --createCatalog option fails when using Oracle as the RDBMS

2018-05-23 Thread Alan Gates (JIRA)

Alan Gates created HIVE-19686:
-

 Summary: schematool  --createCatalog option fails when using 
Oracle as the RDBMS
 Key: HIVE-19686
 URL: https://issues.apache.org/jira/browse/HIVE-19686
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 3.0.1


Attempts to use the schematool --createCatalog option when the metastore is 
using Oracle result in
{code:java}
SQL Error code: 1786
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to add catalog
at org.apache.hive.beeline.HiveSchemaTool.createCatalog(HiveSchemaTool.java:941)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
Caused by: java.sql.SQLSyntaxErrorException: ORA-01786: FOR UPDATE of this 
query expression is not allowed

at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1059)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:522)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:30)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:762)
at 
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:925)
at 
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:)
at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1309)
at 
oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:422)
at org.apache.hive.beeline.HiveSchemaTool.createCatalog(HiveSchemaTool.java:926)
... 7 more
*** schemaTool failed ***{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19685) OpenTracing support for HMS

2018-05-23 Thread Todd Lipcon (JIRA)

Todd Lipcon created HIVE-19685:
--

 Summary: OpenTracing support for HMS
 Key: HIVE-19685
 URL: https://issues.apache.org/jira/browse/HIVE-19685
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Todd Lipcon


When diagnosing performance of metastore operations it isn't always obvious why 
something took a long time. Using a tracing framework can provide an end-to-end 
view of an operation including time spent in dependent systems (eg filesystem 
operations, RDBMS queries, etc). This JIRA proposes to integrate OpenTracing, 
which is a vendor-neutral tracing API into the HMS server and client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19684) Hive stats optimizer wrongly uses stats against non native tables

2018-05-23 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19684:
-

 Summary: Hive stats optimizer wrongly uses stats against non 
native tables
 Key: HIVE-19684
 URL: https://issues.apache.org/jira/browse/HIVE-19684
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Stats of non native tables are inaccurate, thus queries over non native tables 
can not optimized by stats optimizer.
Take example of query 
{code}
Explain select count(*) from (select `__time` from druid_test_table limit 1) as 
src ;
{code} 

the plan will be reduced to 
{code}
POSTHOOK: query: explain extended select count(*) from (select `__time` from 
druid_test_table limit 1) as src
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: 1
  Processor Tree:
ListSink
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67125: HIVE-19418 add background stats updater similar to compactor

2018-05-23 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67125/#review203694
---




ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
Lines 66 (patched)


Is this correct?



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
Lines 138 (patched)


Should add a comment along the line of:
"Security is turned off. So, we can execute with annonymous user."



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
Lines 557 (patched)


This will compute basic and column stats. I assume this is what you want.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Lines 2424 (patched)


Returning concatenated table Name + Dbname is error prone. Lets make this 
return List 



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Lines 2425 (patched)


Currently it will fetch all tables (acid or not). Is that intentional?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
Lines 1642 (patched)


Return value should be List>



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
Lines 738 (patched)


I think this timeunit should be in minutes since this task need to run more 
frequently than that. That will make this config less error-prone.
Default value : 1 hour.


- Ashutosh Chauhan


On May 15, 2018, 4:55 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67125/
> ---
> 
> (Updated May 15, 2018, 4:55 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Seong (Steve) Yeom.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira. This should eventually integrate with ACID stats to determine what 
> stats are out of date, when that is done. Probably in separate jira if this 
> goes in first.
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  3d6fda6bd4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 89129f99fe 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> b698c84080 
>   ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUpdaterThread.java 
> PRE-CREATION 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
>  0be0aaa10c 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  92d2e3f368 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
>  48f77b9878 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  264fdb9db9 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
>  ce7d2861dd 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
>  b223920e82 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/EnumValidator.java
>  PRE-CREATION 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  114d5da205 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  f6899be750 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  98a85cc758 
> 
> 
> Diff: https://reviews.apache.org/r/67125/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>

[jira] [Created] (HIVE-19683) Flaky tests output TestTezPerfCliDriver

2018-05-23 Thread Igor Kryvenko (JIRA)

Igor Kryvenko created HIVE-19683:


 Summary: Flaky tests output TestTezPerfCliDriver
 Key: HIVE-19683
 URL: https://issues.apache.org/jira/browse/HIVE-19683
 Project: Hive
  Issue Type: Bug
Reporter: Igor Kryvenko
Assignee: Igor Kryvenko


After setting {{hive.optimize.index.filter}} to true, {{TestTezPerfCliDriver}} 
has different results every run for more than 60 q files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: May 2018 Hive User Group Meeting

2018-05-23 Thread Sahil Takiar

Wanted to thank everyone for attending the meetup a few weeks ago, and a
huge thanks to all of our speakers! Apologies for the delay, but we finally
have the recording uploaded to Youtube along with all the slides uploaded
to Slidehshare. Below are the links:

Recording: https://youtu.be/gwX3KpHa2j0

   - Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang -
   https://www.slideshare.net/sahiltakiar/hive-on-spark-at-uber-scale
   - Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar -
   
https://www.slideshare.net/sahiltakiar/hive-ons3-performance-past-present-and-future
   - Dali: Data Access Layer at LinkedIn - Adwait Tumbde -
   https://www.slideshare.net/sahiltakiar/dali-data-access-layer
   - Parquet Vectorization in Hive - Vihang Karajgaonkar -
   https://www.slideshare.net/sahiltakiar/parquet-vectorization-in-hive
   - ORC Column Level Encryption - Owen O’Malley -
   https://www.slideshare.net/sahiltakiar/orc-column-encryption
   - Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon -
   https://www.slideshare.net/sahiltakiar/running-hive-at-scale-lyft
   - Materialized Views in Hive - Jesus Camacho Rodriguez -
   
https://www.slideshare.net/sahiltakiar/accelerating-query-processing-with-materialized-views-in-apache-hive-98333641
   - Hive Metastore Caching - Daniel Dai -
   https://www.slideshare.net/sahiltakiar/hive-metastore-cache
   - Hive Metastore Separation - Alan Gates -
   https://www.slideshare.net/sahiltakiar/making-the-metastore-standalone
   - Customer Use Cases & Pain Points of (Big) Metadata - Rituparna Agrawal
   -
   
https://www.slideshare.net/sahiltakiar/customer-use-cases-pain-points-of-big-metadata

If you have any issues accessing the links, feel free to reach out to me.

Looking forward to our next Hive Meetup!

On Mon, May 14, 2018 at 8:45 AM, Sahil Takiar 
wrote:

> Hello,
>
> Yes, the meetup was recorded. We are in the process of getting it uploaded
> to Youtube. Once its publicly available I will send out the link on this
> email thread.
>
> Thanks
>
> --Sahil
>
> On Mon, May 14, 2018 at 6:04 AM,  wrote:
>
>> Hi,
>>
>>
>>
>> If you have recorded the meeting share link please. I could not follow it
>> online for the schedule (I live in Spain).
>>
>>
>>
>> Kind Regards,
>>
>>
>>
>>
>>
>> *From:* Luis Figueroa [mailto:lef...@outlook.com]
>> *Sent:* miércoles, 9 de mayo de 2018 18:01
>> *To:* u...@hive.apache.org
>> *Cc:* dev@hive.apache.org
>> *Subject:* Re: May 2018 Hive User Group Meeting
>>
>>
>>
>> Hey everyone,
>>
>>
>>
>> Was the meeting recorded by any chance?
>>
>> Luis
>>
>>
>> On May 8, 2018, at 5:31 PM, Sahil Takiar  wrote:
>>
>> Hey Everyone,
>>
>>
>>
>> Almost time for the meetup! The live stream can be viewed on this link:
>> https://live.lifesizecloud.com/extension/2000992219?token=
>> 067078ac-a8df-45bc-b84c-4b371ecbc719==en
>> =Hive%20User%20Group%20Meetup
>>
>> The stream won't be live until the meetup starts.
>>
>> For those attending in person, there will be guest wifi:
>>
>> Login: HiveMeetup
>> Password: ClouderaHive
>>
>>
>>
>> On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar 
>> wrote:
>>
>> Hey Everyone,
>>
>>
>>
>> The meetup is only a day away! Here
>> 
>> is a link to all the abstracts we have compiled thus far. Several of you
>> have asked about event streaming and recordings. The meetup will be both
>> streamed live and recorded. We will post the links on this thread and on
>> the meetup link tomorrow closer to the start of the meetup.
>>
>>
>>
>> The meetup will be at Cloudera HQ - 395 Page Mill Rd
>> . If
>> you have any trouble getting into the building, feel free to post on the
>> meetup link.
>>
>>
>>
>> Meetup Link: https://www.meetup.com/Hive-User-Group-Meeting/events/
>> 249641278/
>>
>>
>>
>> On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar 
>> wrote:
>>
>> Hey Everyone,
>>
>>
>>
>> The agenda for the meetup has been set and I'm excited to say we have
>> lots of interesting talks scheduled! Below is final agenda, the full list
>> of abstracts will be sent out soon. If you are planning to attend, please
>> RSVP on the meetup link so we can get an accurate headcount of attendees (
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>>
>>
>> 6:30 - 7:00 PM Networking and Refreshments
>>
>> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>>
>> · What's new in Hive 3.0.0 - Ashutosh Chauhan
>>
>> · Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>>
>> · Hive-on-S3 Performance: Past, Present, and Future - Sahil
>> Takiar
>>
>> · Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>>
>> · Parquet Vectorization in Hive - Vihang Karajgaonkar
>>
>> · ORC Column Level

[jira] [Created] (HIVE-19682) Provide option for GenericUDTFGetSplits to return only schema metadata

2018-05-23 Thread Eric Wohlstadter (JIRA)

Eric Wohlstadter created HIVE-19682:
---

 Summary: Provide option for GenericUDTFGetSplits to return only 
schema metadata
 Key: HIVE-19682
 URL: https://issues.apache.org/jira/browse/HIVE-19682
 Project: Hive
  Issue Type: Improvement
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


For some uses cases it is necessary to know the output schema for a HiveQL 
before executing the query. But there is no existing client API that provides 
this information.

Hive JDBC doesn't provide the schema for parametric types in 
{{ResultSetMetaData}}.

GenericUDTFGetSplits bundles the proper schema metadata with the fragments for 
input splits. An option can be added to return only the schema metadata from 
compilation, and the generation of input splits can be skipped.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19681) Fix TestVectorIfStatement

2018-05-23 Thread Vihang Karajgaonkar (JIRA)

Vihang Karajgaonkar created HIVE-19681:
--

 Summary: Fix TestVectorIfStatement
 Key: HIVE-19681
 URL: https://issues.apache.org/jira/browse/HIVE-19681
 Project: Hive
  Issue Type: Test
  Components: Vectorization
Affects Versions: 3.1.0, 4.0.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{TestVectorIfStatement}} generates interesting batches (injection of random 
repeating null column values and repeating non-null values) when evaluating the 
vectorized expressions. But the modification of random rows is done after the 
row mode is evaluated. Hence it is likely that comparison results will fail. I 
am not sure how its working in the first place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19680) Push down limit is not applied for Druid storage handler.

2018-05-23 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19680:
-

 Summary: Push down limit is not applied for Druid storage handler.
 Key: HIVE-19680
 URL: https://issues.apache.org/jira/browse/HIVE-19680
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


Query like 
{code}
select `__time` from druid_test_table limit 1;
{code}
returns more than one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19678) Ignore TestBeeLineWithArgs

2018-05-23 Thread Zoltan Haindrich (JIRA)

Zoltan Haindrich created HIVE-19678:
---

 Summary: Ignore TestBeeLineWithArgs
 Key: HIVE-19678
 URL: https://issues.apache.org/jira/browse/HIVE-19678
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


timeouts every ~5. build 

https://builds.apache.org/job/PreCommit-HIVE-Build/11155/testReport/org.apache.hive.beeline/TestBeeLineWithArgs/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19679) Enable TestBeeLineWithArgs

2018-05-23 Thread Zoltan Haindrich (JIRA)

Zoltan Haindrich created HIVE-19679:
---

 Summary: Enable TestBeeLineWithArgs
 Key: HIVE-19679
 URL: https://issues.apache.org/jira/browse/HIVE-19679
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Reporter: Zoltan Haindrich






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19677) Disable sample6.q

2018-05-23 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-19677:
--

 Summary: Disable sample6.q
 Key: HIVE-19677
 URL: https://issues.apache.org/jira/browse/HIVE-19677
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Jesus Camacho Rodriguez


Flaky test, already found similar behavior with sample2.q and sample4.q 
(HIVE-19657). More info to reproduce and try to fix the issue in HIVE-19673.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-05-23 Thread Marta Kuczora via Review Board



> On April 18, 2018, 9:52 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3323 (original), 3396 (patched)
> > 
> >
> > Should we set interrupted flag on the thread if we get 
> > InterruptedException?
> 
> Marta Kuczora wrote:
> Could you please give me some details about why you think it is needed? I 
> don't know actually if it is needed or not. My idea here was to go through on 
> all FutureTasks and if one of them didn't finish successfully (there was 
> either an error or the task was interrupted), throw an exception, cause it 
> would mean that not all partition folders were created successfully. For this 
> I don't think that I should set anything on the thread, but I might miss 
> something. So could you please explain me your thoughts on this?

I just uploaded a new patch with this change.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/#review201465
---


On May 23, 2018, 4:24 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7/
> ---
> 
> (Updated May 23, 2018, 4:24 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.
> 
> 
> Bugs: HIVE-19046
> https://issues.apache.org/jira/browse/HIVE-19046
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The biggest part of these methods use the same code. Refactored these code 
> parts to common methods.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  92d2e3f 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  88064d9 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  debcd0e 
> 
> 
> Diff: https://reviews.apache.org/r/7/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>

Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-05-23 Thread Marta Kuczora via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated May 23, 2018, 4:24 p.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Changes
---

Address review finding.


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 92d2e3f 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 88064d9 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 debcd0e 


Diff: https://reviews.apache.org/r/7/diff/3/

Changes: https://reviews.apache.org/r/7/diff/2-3/


Testing
---


Thanks,

Marta Kuczora

[jira] [Created] (HIVE-19676) Ability to selectively run tests in TestBlobstoreCliDriver

2018-05-23 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19676:
---

 Summary: Ability to selectively run tests in TestBlobstoreCliDriver
 Key: HIVE-19676
 URL: https://issues.apache.org/jira/browse/HIVE-19676
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The {{TestBlobstoreCliDriver}} contains a {{testconfiguration.properties}}, but 
it doesn't seem to be used anywhere. It would be nice if it could be used to 
define which to run or which tests to exclude.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19675) Cast to timestamps on Druid time column leads to an exception

2018-05-23 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19675:
-

 Summary: Cast to timestamps on Druid time column leads to an 
exception
 Key: HIVE-19675
 URL: https://issues.apache.org/jira/browse/HIVE-19675
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


The following query fail due to a formatting issue.
{code}
SELECT CAST(`ssb_druid_100`.`__time` AS TIMESTAMP) AS `x_time`,
. . . . . . . . . . . . . . . .>   SUM(`ssb_druid_100`.`lo_revenue`) AS 
`sum_lo_revenue_ok`
. . . . . . . . . . . . . . . .> FROM `druid_ssb`.`ssb_druid_100` 
`ssb_druid_100`
. . . . . . . . . . . . . . . .> GROUP BY CAST(`ssb_druid_100`.`__time` AS 
TIMESTAMP);
{code} 
Exception
{code} 
Error: java.io.IOException: java.lang.NumberFormatException: For input string: 
"1991-12-31 19:00:00" (state=,code=0)
{code}
[~jcamachorodriguez] maybe this is fixed by your upcoming patches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19674) Group by Decimal Constants push down to Druid tables.

2018-05-23 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19674:
-

 Summary: Group by Decimal Constants push down to Druid tables.
 Key: HIVE-19674
 URL: https://issues.apache.org/jira/browse/HIVE-19674
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Queries like following gets generated by Tableau.
{code}
SELECT SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok`
 FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100`
GROUP BY 1.1001;
{code}

The Group key is pushed down to Druid as a Constant Column, this leads to an 
Exception while parsing back the results since Druid Input format does not 
allow Decimals.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19673) qtest: parquet_ctas.q breaks sample6.q

2018-05-23 Thread Zoltan Haindrich (JIRA)

Zoltan Haindrich created HIVE-19673:
---

 Summary: qtest: parquet_ctas.q breaks sample6.q
 Key: HIVE-19673
 URL: https://issues.apache.org/jira/browse/HIVE-19673
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Zoltan Haindrich


there is a strange diff happening currently from time to time...sample6 is 
failed

running it on its own is fine...however when parquet_ctas.q runs before it; the 
*result set* is changed

{code}
time mvn install -pl itests/qtest -DskipSparkTests -Pitests 
-Dtest=TestCliDriver -Dqfile=parquet_ctas.q,sample6.q  -Dtest.output.overwrite 
-Dmaven.surefire.debugX 
{code}

note: sample6.q is run also via the spark driver; and the intresting is that 
the "fluctuating" new resultset matches with the spark driver's output

{code}
diff -Naur ./ql/src/test/results/clientpositive/sample6.q.out 
./ql/src/test/results/clientpositive/spark/sample6.q.out | grep val_
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19672) Column Names mismatch between native Druid Tables and Hive External map

2018-05-23 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19672:
-

 Summary: Column Names mismatch between native Druid Tables and 
Hive External map
 Key: HIVE-19672
 URL: https://issues.apache.org/jira/browse/HIVE-19672
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
 Fix For: 4.0.0


Druid Columns names are case sensitive while Hive is case insensitive.
This implies that any Druid Datasource that has columns with some upper cases 
as part of column name it will not return the expected results.
One possible fix is to try to remap the column names before issuing Json Query 
to Druid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-05-23 Thread Rui Li (JIRA)

Rui Li created HIVE-19671:
-

 Summary: Distribute by rand() can lead to data inconsistency
 Key: HIVE-19671
 URL: https://issues.apache.org/jira/browse/HIVE-19671
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


Noticed the following queries can give different results:
{code}
select count(*) from tbl;
select count(*) from (select * from tbl distribute by rand());
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Review Request 67266: HIVE-19587 HeartBeat thread uses cancelled delegation token while connecting to meta on KERBEROS cluster

2018-05-23 Thread Oleksiy Sayankin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67266/
---

Review request for hive.


Repository: hive-git


Description
---

Initial commit


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 248632127a 


Diff: https://reviews.apache.org/r/67266/diff/1/


Testing
---


Thanks,

Oleksiy Sayankin

[GitHub] hive pull request #354: HIVE-19653: Incorrect predicate pushdown for groupby...

2018-05-23 Thread richox

GitHub user richox opened a pull request:

https://github.com/apache/hive/pull/354

HIVE-19653: Incorrect predicate pushdown for groupby with grouping sets



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/richox/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/354.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #354


commit ae60a4802d0f9301d737a247ba85ed4cc56d34a7
Author: zhanglirich 
Date:   2018-05-22T13:41:35Z

HIVE-19653: Incorrect predicate pushdown for groupby with grouping sets




---

[jira] [Created] (HIVE-19691) Start SessionState in materialized views registry

[jira] [Created] (HIVE-19690) multi-insert query with multiple GBY, and distinct in only some branches can produce incorrect results

How to do HMS and HDFS authentication

[jira] [Created] (HIVE-19689) Not able to do insert into table belonging to a non default namespace - HDFS federated cluster

[jira] [Created] (HIVE-19688) Make catalogs updatable

[jira] [Created] (HIVE-19687) Export table on acid partitioned table is failing

[jira] [Created] (HIVE-19686) schematool --createCatalog option fails when using Oracle as the RDBMS

[jira] [Created] (HIVE-19685) OpenTracing support for HMS

[jira] [Created] (HIVE-19684) Hive stats optimizer wrongly uses stats against non native tables

Re: Review Request 67125: HIVE-19418 add background stats updater similar to compactor

[jira] [Created] (HIVE-19683) Flaky tests output TestTezPerfCliDriver

Re: May 2018 Hive User Group Meeting

[jira] [Created] (HIVE-19682) Provide option for GenericUDTFGetSplits to return only schema metadata

[jira] [Created] (HIVE-19681) Fix TestVectorIfStatement

[jira] [Created] (HIVE-19680) Push down limit is not applied for Druid storage handler.

[jira] [Created] (HIVE-19678) Ignore TestBeeLineWithArgs

[jira] [Created] (HIVE-19679) Enable TestBeeLineWithArgs

[jira] [Created] (HIVE-19677) Disable sample6.q

Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

[jira] [Created] (HIVE-19676) Ability to selectively run tests in TestBlobstoreCliDriver

[jira] [Created] (HIVE-19675) Cast to timestamps on Druid time column leads to an exception

[jira] [Created] (HIVE-19674) Group by Decimal Constants push down to Druid tables.

[jira] [Created] (HIVE-19673) qtest: parquet_ctas.q breaks sample6.q

[jira] [Created] (HIVE-19672) Column Names mismatch between native Druid Tables and Hive External map

[jira] [Created] (HIVE-19671) Distribute by rand() can lead to data inconsistency

Review Request 67266: HIVE-19587 HeartBeat thread uses cancelled delegation token while connecting to meta on KERBEROS cluster

[GitHub] hive pull request #354: HIVE-19653: Incorrect predicate pushdown for groupby...

28 matches

Site Navigation

Mail list logo

Footer information