[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-07-29 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647277#comment-14647277
 ] 

Lefty Leverenz commented on HIVE-10165:
---

If the "fix version" bit looks familiar, that's because I borrowed it from your 
comment on HIVE-9583.

> Improve hive-hcatalog-streaming extensibility and support updates and deletes.
> --
>
> Key: HIVE-10165
> URL: https://issues.apache.org/jira/browse/HIVE-10165
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: TODOC2.0, streaming_api
> Fix For: 2.0.0
>
> Attachments: HIVE-10165.0.patch, HIVE-10165.10.patch, 
> HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, 
> HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png
>
>
> h3. Overview
> I'd like to extend the 
> [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
>  API so that it also supports the writing of record updates and deletes in 
> addition to the already supported inserts.
> h3. Motivation
> We have many Hadoop processes outside of Hive that merge changed facts into 
> existing datasets. Traditionally we achieve this by: reading in a 
> ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
> sequence and then applying a function to determine inserted, updated, and 
> deleted rows. However, in our current scheme we must rewrite all partitions 
> that may potentially contain changes. In practice the number of mutated 
> records is very small when compared with the records contained in a 
> partition. This approach results in a number of operational issues:
> * Excessive amount of write activity required for small data changes.
> * Downstream applications cannot robustly read these datasets while they are 
> being updated.
> * Due to scale of the updates (hundreds or partitions) the scope for 
> contention is high. 
> I believe we can address this problem by instead writing only the changed 
> records to a Hive transactional table. This should drastically reduce the 
> amount of data that we need to write and also provide a means for managing 
> concurrent access to the data. Our existing merge processes can read and 
> retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
> an updated form of the hive-hcatalog-streaming API which will then have the 
> required data to perform an update or insert in a transactional manner. 
> h3. Benefits
> * Enables the creation of large-scale dataset merge processes  
> * Opens up Hive transactional functionality in an accessible manner to 
> processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-29 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647266#comment-14647266
 ] 

Carl Steinbach commented on HIVE-4239:
--

It should probably go in both the hs2 and compiler sections.



> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Nezih Yigitbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nezih Yigitbasi updated HIVE-10319:
---
Attachment: HIVE-10319.5.patch

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()

2015-07-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647243#comment-14647243
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11316:
--

[~jcamachorodriguez] Can you please look at patch#7

Thanks
Hari

> Use datastructure that doesnt duplicate any part of string for 
> ASTNode::toStringTree()
> --
>
> Key: HIVE-11316
> URL: https://issues.apache.org/jira/browse/HIVE-11316
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11316-branch-1.0.patch, 
> HIVE-11316-branch-1.2.patch, HIVE-11316.1.patch, HIVE-11316.2.patch, 
> HIVE-11316.3.patch, HIVE-11316.4.patch, HIVE-11316.5.patch, 
> HIVE-11316.6.patch, HIVE-11316.7.patch
>
>
> HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
> is suppose to alter the string memoization to use a different data structure 
> that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647241#comment-14647241
 ] 

Hive QA commented on HIVE-10319:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747893/HIVE-10319.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4755/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4755/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4755/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-metastore 
---
[INFO] Compiling 244 source files to 
/data/hive-ptest/working/apache-github-source-source/metastore/target/classes
[INFO] -
[WARNING] COMPILATION WARNING : 
[INFO] -
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetOpenTxnsResponse.java:
 Some input files use unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetOpenTxnsResponse.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 4 warnings 
[INFO] -
[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:[71,44]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: package org.apache.hadoop.hive.metastore.api
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:[5544,12]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[217,12]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: interface 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Iface
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java:[79,44]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: package org.apache.hadoop.hive.metastore.api
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:[40,44]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: package org.apache.hadoop.hive.metastore.api
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java:[2049,10]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class org.apache.hadoop.hive.metastore.HiveMetaStoreClient
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:[1134,3]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: interface org.apache.hadoop.hive.metastore.IMetaStoreClient
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[7491,14]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.get_all_functions_call
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[3300,12]
 cannot find symbol
  symbol:   class GetAllFunctionsResponse
  location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client
[ERROR] 
/data/hive-ptest/working/apache-github-source-source/metastore/src/gen/

[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-29 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647240#comment-14647240
 ] 

Lefty Leverenz commented on HIVE-4239:
--

Thanks [~cwsteinbach], I had missed that comment.  Makes sense.

So is it okay to document *hive.driver.parallel.compilation* in the HS2 section 
or should it go in the general section?

> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647238#comment-14647238
 ] 

Hive QA commented on HIVE-11387:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747871/HIVE-11387.04.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4754/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4754/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4754/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747871 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch, HIVE-11387.04.patch
>
>
> {noformat}
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647231#comment-14647231
 ] 

Pengcheng Xiong commented on HIVE-11409:


[~jcamachorodriguez], could u review this small patch? Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
> UNION
> --
>
> Key: HIVE-11409
> URL: https://issues.apache.org/jira/browse/HIVE-11409
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11409.01.patch
>
>
> Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
> branch is the primary branch) of union can be casted to that of the primary 
> branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
> redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11409:
---
Attachment: HIVE-11409.01.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
> UNION
> --
>
> Key: HIVE-11409
> URL: https://issues.apache.org/jira/browse/HIVE-11409
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11409.01.patch
>
>
> Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
> branch is the primary branch) of union can be casted to that of the primary 
> branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
> redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8950) Add support in ParquetHiveSerde to create table schema from a parquet file

2015-07-29 Thread Gaurav Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647185#comment-14647185
 ] 

Gaurav Kumar commented on HIVE-8950:


I was wondering how will this be handled in schema evolution.
In avro, currently what we do is specify the avro.schema.url to the schema file.
When we want to change the schema, we only change the contents of the schema 
file to, let's say, add a new column, and the new data will automatically be 
deserialized using the new schema.
In this case if we specify a parquet file instead of a schema file, how will 
that be used in schema evolution? We'll have to change the table DDL definition 
everytime to point to the file containing the newest schema. Avro tables can 
have different files pertaining to different schemas in diff partitions.
I know there is no separate schema per se in parquet files.


> Add support in ParquetHiveSerde to create table schema from a parquet file
> --
>
> Key: HIVE-8950
> URL: https://issues.apache.org/jira/browse/HIVE-8950
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish K Singh
>Assignee: Gaurav Kumar
> Attachments: HIVE-8950.1.patch, HIVE-8950.2.patch, HIVE-8950.3.patch, 
> HIVE-8950.4.patch, HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch, 
> HIVE-8950.8.patch, HIVE-8950.patch
>
>
> PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without 
> having to specify the column names and types. As, parquet files store schema 
> in their footer, it is possible to generate hive schema from parquet file's 
> metadata. This will improve usability of parquet backed tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2015-07-29 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11408:

Description: 
I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
(since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 

Basically, add jar creates a new classloader for loading the classes from the 
new jar and adds the new classloader to the SessionState object of user's 
session, making the older one its parent. Creating a temporary function uses 
the new classloader to load the class used for the function. On closing a 
session, although there is code to close the classloader for the session, I'm 
not seeing the new classloader getting GCed and from the heapdump I can see it 
holds on to the temporary function's class that should have gone away after the 
session close. 

Steps to reproduce:
1.
{code}
jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
{code}


2. 
Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
added.


3. 
{code}
jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
'org.gumashta.udf.AUDF'; 
{code}


4. 
Close the jdbc session.

5. 
Take the memory snapshot and verify that the new URLClassLoader is indeed there 
and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the session 
which we already closed.





  was:
I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
(since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 

Basically, add jar creates a new classloader for loading the classes from the 
new jar and adds the new classloader to the SessionState object of user's 
session, making the older one its parent. Creating a temporary function uses 
the new classloader to load the class used for the function. On closing a 
session, although there is code to close the classloader for the session, I'm 
not seeing the new classloader getting GCed and from the heapdump I can see it 
holds on to the temporary function's class that should have gone away after the 
session close. 

Steps to reproduce:
1.
{code}
jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
{code}


2. 
Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
added.


3. 
{code}
jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
'org.gumashta.udf.AUDF'; 
{code}


4. 
Take the memory snapshot and verify that the new URLClassLoader is indeed there 
and is holding onto the class it loaded (org.gumashta.udf.AUDF).





> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8950) Add support in ParquetHiveSerde to create table schema from a parquet file

2015-07-29 Thread Gaurav Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaurav Kumar reassigned HIVE-8950:
--

Assignee: Gaurav Kumar  (was: Ashish K Singh)

> Add support in ParquetHiveSerde to create table schema from a parquet file
> --
>
> Key: HIVE-8950
> URL: https://issues.apache.org/jira/browse/HIVE-8950
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish K Singh
>Assignee: Gaurav Kumar
> Attachments: HIVE-8950.1.patch, HIVE-8950.2.patch, HIVE-8950.3.patch, 
> HIVE-8950.4.patch, HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch, 
> HIVE-8950.8.patch, HIVE-8950.patch
>
>
> PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without 
> having to specify the column names and types. As, parquet files store schema 
> in their footer, it is possible to generate hive schema from parquet file's 
> metadata. This will improve usability of parquet backed tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11185) Fix compustat_avro.q/load_dyn_part14_win.q for Windows

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647168#comment-14647168
 ] 

Hive QA commented on HIVE-11185:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747866/HIVE-11185.2.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4753/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4753/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4753/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747866 - PreCommit-HIVE-TRUNK-Build

> Fix compustat_avro.q/load_dyn_part14_win.q for Windows
> --
>
> Key: HIVE-11185
> URL: https://issues.apache.org/jira/browse/HIVE-11185
> Project: Hive
>  Issue Type: Bug
>  Components: Tests, Windows
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11185.1.patch, HIVE-11185.2.patch
>
>
> compustat_avro.q: The way the location file path was being specified using 
> system:hive.root wasn't working on Windows. create_like.q has an example of 
> how to get this to work, using system:test.tmp.dir
> load_dyn_part14_win.q: Looks like HIVE-9039 changed the Hive syntax, and the 
> original load_dyn_part14.q had changed but the Windows-specific version 
> load_dyn_part14_win.q had not been updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-07-29 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647156#comment-14647156
 ] 

Aaron Tokhy commented on HIVE-10631:


hive.stats.reliable requires that both numFiles and totalSize be set properly, 
regardless of the condition.  So if 'create table' or 'create external table' 
were to use a location already populated with partitions, it will traverse 
those partitions regardless.

As of writing, hive.stats.reliable appears to be set to false by default.  
Perhaps stats calculation on creation of partitioned tables can be forgone when 
hive.stats.reliable is false only, as stats will be populated on MSCK REPAIR 
PARTITIONS or by adding partitions using ALTER TABLE ADD PARTITION.


> create_table_core method has invalid update for Fast Stats
> --
>
> Key: HIVE-10631
> URL: https://issues.apache.org/jira/browse/HIVE-10631
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0, 1.0.0
>Reporter: Dongwook Kwon
>Priority: Minor
>
> HiveMetaStore.create_table_core method calls 
> MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
> is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
> call scanning warehouse dir and doesn't seem to use it. 
> "Fast Stats" was implemented by HIVE-3959
> https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
> From create_table_core method
> {code}
> if (HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVESTATSAUTOGATHER) &&
> !MetaStoreUtils.isView(tbl)) {
>   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
> madeDir);
>   } else { // Partitioned table with no partitions.
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
> true);
>   }
> }
> {code}
> Particularly Line 1363: // Partitioned table with no partitions.
> {code}
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
> {code}
> This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
> do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
> newDir flag is always true
> Impact of this bug is minor with HDFS warehouse 
> location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
> location especially for large existing partitions.
> Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
> basically it could scan wrong S3 directory recursively and do nothing with 
> it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11386) Improve Vectorized GROUP BY Performance (Phase 1)

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647113#comment-14647113
 ] 

Hive QA commented on HIVE-11386:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747860/HIVE-11386.02.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9281 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4752/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4752/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4752/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747860 - PreCommit-HIVE-TRUNK-Build

> Improve Vectorized GROUP BY Performance (Phase 1)
> -
>
> Key: HIVE-11386
> URL: https://issues.apache.org/jira/browse/HIVE-11386
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11386.01.patch, HIVE-11386.02.patch
>
>
> Improve vectorized GROUP BY performance, with an eye towards the new LLAP 
> memory management (dramatically reduce the number of Java object, allocate 
> very large objects, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647036#comment-14647036
 ] 

Hive QA commented on HIVE-11383:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747861/HIVE-11383.6.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4751/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4751/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4751/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
spark-client ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ spark-client ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf
 [copy] Copying 11 files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
spark-client ---
[INFO] Compiling 5 source files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/test-classes
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:copy (copy-guava-14) @ spark-client ---
[INFO] Configured Artifact: com.google.guava:guava:14.0.1:jar
[INFO] Copying guava-14.0.1.jar to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/dependency/guava-14.0.1.jar
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ spark-client ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
spark-client ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar
 to 
/home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/spark-client/pom.xml to 
/home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Query Language 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec ---
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-exec ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
Generating vector expression code
Generating vector expression test code
[INFO] Executed tasks
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec ---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java
 added.
[INFO] 
[INFO] --- antlr3-maven-plugin:3.4:antlr (default) @ hive-exec ---
[INFO] AN

[jira] [Commented] (HIVE-11257) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to support MultiJoin operators

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647025#comment-14647025
 ] 

Hive QA commented on HIVE-11257:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747838/HIVE-11257.04.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9276 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4750/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4750/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4750/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747838 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method 
> isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to 
> support MultiJoin operators merge
> -
>
> Key: HIVE-11257
> URL: https://issues.apache.org/jira/browse/HIVE-11257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11257.01.patch, HIVE-11257.02.patch, 
> HIVE-11257.03.patch, HIVE-11257.04.patch, HIVE-11257.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647008#comment-14647008
 ] 

Jason Dere commented on HIVE-10319:
---

I think I see the reason for the change in the diff - THRIFT-2172 seems to have 
moved the "optionals" array from being an instance (per-object) field, to a 
class-level field.  As a result the object inspector created based on the 
Thrift object is now missing the "optionals" field, because you re-generated 
the Java files for megastruct.thrift which was used in this test. Simply 
re-generating with thrift 0.9.0 is fine. So would using Thrift 0.9.2, if you 
also had updated convert_enum_to_string.q.out.

The "optional" vs "required" inconsistency in the generated files looks like 
it's just a comment, so I think this difference is harmless.

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Nezih Yigitbasi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646986#comment-14646986
 ] 

Nezih Yigitbasi commented on HIVE-10319:


[~jdere] I updated the patch. Seems like thrift 0.9.2 generated a lot of crap, 
with 0.9.0 the patch got way smaller. BTW I observed an inconsistency between 
the metastore interface definition 
([here|https://github.com/apache/hive/blob/master/metastore/if/hive_metastore.thrift#L666-670])
 and the generated classes (see the corresponding generated class 
[here|https://github.com/apache/hive/blob/master/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java#L629-636]).
 The field partitionnames is required in the interface definition, but it's 
optional in the generated class -- probably the generated classes haven't been 
committed with that change. My latest patch fixes this inconsistency, but not 
sure whether this may have a side effect or not.

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-29 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: HIVE-11407-branch-1.0.patch

HIVE-11407-branch-1.0.patch- Slightly modified version of patch from 
[~sushanth] , that applies on branch-1.0. (just for reference, not for commit).



> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Nezih Yigitbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nezih Yigitbasi updated HIVE-10319:
---
Attachment: HIVE-10319.4.patch

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646970#comment-14646970
 ] 

Thejas M Nair commented on HIVE-11407:
--

In the case in description, the OOM happened in HS2 because of  embedded 
metastore being used, if it were remote metastore server, the OOM would happen 
in the metastore server.


> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11389) hbase import should allow partial imports and should work in parallel

2015-07-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646956#comment-14646956
 ] 

Alan Gates commented on HIVE-11389:
---

Linking to HIVE-9752 since the tool has many user facing options that need 
documenting.

> hbase import should allow partial imports and should work in parallel
> -
>
> Key: HIVE-11389
> URL: https://issues.apache.org/jira/browse/HIVE-11389
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Affects Versions: hbase-metastore-branch
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-11389.patch
>
>
> Currently the hbaseimport tool always imports a whole metastore serially.  
> This has a couple of issues.  One, users may wish to import only certain 
> parts of their metastore.  Two, when there are tables with many partitions it 
> can take a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11389) hbase import should allow partial imports and should work in parallel

2015-07-29 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11389:
--
Attachment: HIVE-11389.patch

I basically rewrote HBaseImport.

The user can now pick selected objects to import, including role(s), 
database(s), table(s), function(s), and kerberos related items (master key and 
tokens).  Importing an object imports all contained objects.  That is, if you 
import a database you will get all of the tables and functions in that 
database.  If the user wishes to import just some items in the database he can 
create the database on the hbase side and then import the desired tables and 
functions.

There is no option to import just some partitions, as that seemed confusing.

I also completely changed the way tables and partitions are copied.  In the 
past these were done one by one.

Now, for tables the importer builds a list of all tables to import based on 
database it imported and any user requested tables.  It then spawns threads to 
fetch the table definitions from the RDBMS and write them to HBase in parallel.

For partitions, it in parallel fetches all partition names and breaks them into 
batches of at most 1000 (configurable).  Separate threads then handle fetching 
the partitions as a batch and writing them as a batch to HBase.  This solves a 
couple of problems versus the previous code: 1) we are no longer depending on 
being able to instantiate all partitions for a table in memory simultaneously; 
2) we are no longer adding partitions one by one; rather a batch of 1000 is 
read and then written with one call each.

The parallelism can be set by the user and defaults to 1.

> hbase import should allow partial imports and should work in parallel
> -
>
> Key: HIVE-11389
> URL: https://issues.apache.org/jira/browse/HIVE-11389
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Affects Versions: hbase-metastore-branch
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-11389.patch
>
>
> Currently the hbaseimport tool always imports a whole metastore serially.  
> This has a couple of issues.  One, users may wish to import only certain 
> parts of their metastore.  Two, when there are tables with many partitions it 
> can take a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646921#comment-14646921
 ] 

Hive QA commented on HIVE-11316:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747837/HIVE-11316.7.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9274 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4749/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4749/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4749/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747837 - PreCommit-HIVE-TRUNK-Build

> Use datastructure that doesnt duplicate any part of string for 
> ASTNode::toStringTree()
> --
>
> Key: HIVE-11316
> URL: https://issues.apache.org/jira/browse/HIVE-11316
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11316-branch-1.0.patch, 
> HIVE-11316-branch-1.2.patch, HIVE-11316.1.patch, HIVE-11316.2.patch, 
> HIVE-11316.3.patch, HIVE-11316.4.patch, HIVE-11316.5.patch, 
> HIVE-11316.6.patch, HIVE-11316.7.patch
>
>
> HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
> is suppose to alter the string memoization to use a different data structure 
> that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646917#comment-14646917
 ] 

Carl Steinbach commented on HIVE-11402:
---

bq. Note that running queries in parallel for single session is not 
straightforward with jdbc, you need to spawn another thread as the 
Statement.execute calls are blocking. I believe ODBC has non blocking query 
execution API ...

I made a couple mistakes when I designed the HS2 API (the use of Thrift Enums 
and Unions comes to mind), but by far the biggest mistake was allowing a 1:many 
relationship between Sessions and Operations. At the time I thought there was a 
chance that the ODBC spec required this, but now think this is something best 
handled on the client side. Providing support for the 1:many Session:Operation 
mapping results in a lot of additional complexity on the server-side, only to 
yield a feature with a very high potential for misuse.

Rather than temporarily serializing operations against a given session, I 
propose instead that we enforce a 1:1 mapping between HS2 sessions and active 
operations. This is a backward incompatible change, but one which I think will 
yield far better results in the long term.

> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Nezih Yigitbasi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646914#comment-14646914
 ] 

Nezih Yigitbasi commented on HIVE-10319:


[~jdere] I believe the failure is due to my changes as I see that I use thrift 
0.9.2 which generates different code than the 0.9 version, which Hive requires 
according to [How To 
Contribute|https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-GeneratingThriftCode]

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646910#comment-14646910
 ] 

Matt McCline commented on HIVE-11214:
-

Also committed to branch-1, also.

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-11402:
--
Component/s: HiveServer2

> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-29 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646892#comment-14646892
 ] 

Carl Steinbach commented on HIVE-4239:
--

bq. shouldn't it be named hive.server2.driver.parallel.compilation to match the 
other HS2 parameters?

[~leftylev], please see my comment here: 
https://issues.apache.org/jira/browse/HIVE-4239?focusedCommentId=14564517&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14564517


> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-4239) Remove lock on compilation stage

2015-07-29 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646892#comment-14646892
 ] 

Carl Steinbach edited comment on HIVE-4239 at 7/29/15 11:08 PM:


bq. shouldn't it be named hive.server2.driver.parallel.compilation to match the 
other HS2 parameters?

[~leftylev], please see my comment [up 
above|https://issues.apache.org/jira/browse/HIVE-4239?focusedCommentId=14564517&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14564517].



was (Author: cwsteinbach):
bq. shouldn't it be named hive.server2.driver.parallel.compilation to match the 
other HS2 parameters?

[~leftylev], please see my comment here: 
https://issues.apache.org/jira/browse/HIVE-4239?focusedCommentId=14564517&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14564517


> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11214:

Fix Version/s: 1.3.0

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646875#comment-14646875
 ] 

Thejas M Nair commented on HIVE-11402:
--

[~sershe] Thanks for clarifying, I had forgotten about the 
SessionState.get().getCompileLock()  that you added. I will update the 
description.
Looks like we can still have issues outside of compilation, during parallel 
query execution within session.

> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11402:
-
Description: 
HiveServer2 currently allows concurrent queries to be run in a single session. 
However, every HS2 session has  an associated SessionState object, and the use 
of SessionState in many places assumes that only one thread is using it, ie it 
is not thread safe.
There are many places where SesssionState thread safety needs to be addressed, 
and until then we should serialize all query execution for a single HS2 
session. -This problem can become more visible with HIVE-4239 now allowing 
parallel query compilation.-

Note that running queries in parallel for single session is not straightforward 
 with jdbc, you need to spawn another thread as the Statement.execute calls are 
blocking. I believe ODBC has non blocking query execution API, and Hue is 
another well known application that shares sessions for all queries that a user 
runs.


  was:
HiveServer2 currently allows concurrent queries to be run in a single session. 
However, every HS2 session has  an associated SessionState object, and the use 
of SessionState in many places assumes that only one thread is using it, ie it 
is not thread safe.
There are many places where SesssionState thread safety needs to be addressed, 
and until then we should serialize all query execution for a single HS2 
session. This problem can become more visible with HIVE-4239 now allowing 
parallel query compilation.

Note that running queries in parallel for single session is not straightforward 
 with jdbc, you need to spawn another thread as the Statement.execute calls are 
blocking. I believe ODBC has non blocking query execution API, and Hue is 
another well known application that shares sessions for all queries that a user 
runs.



> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11405:
-
Assignee: Prasanth Jayachandran  (was: Hari Sankar Sivarama Subramaniyan)

> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
> --
>
> Key: HIVE-11405
> URL: https://issues.apache.org/jira/browse/HIVE-11405
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Prasanth Jayachandran
>
> Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
> him,
> "The recursion protection works well with an AND expr, but it doesn't work 
> against
> (OR a=1 (OR a=2 (OR a=3 (OR ...)
> since the for the rows will never be reduced during recursion due to the 
> nature of the OR.
> We need to execute a short-circuit to satisfy the OR properly - no case which 
> matches a=1 qualifies for the rest of the filters.
> Recursion should pass in the numRows - branch1Rows for the branch-2."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan reassigned HIVE-11405:


Assignee: Hari Sankar Sivarama Subramaniyan  (was: Mostafa Mokhtar)

> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
> --
>
> Key: HIVE-11405
> URL: https://issues.apache.org/jira/browse/HIVE-11405
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
> him,
> "The recursion protection works well with an AND expr, but it doesn't work 
> against
> (OR a=1 (OR a=2 (OR a=3 (OR ...)
> since the for the rows will never be reduced during recursion due to the 
> nature of the OR.
> We need to execute a short-circuit to satisfy the OR properly - no case which 
> matches a=1 qualifies for the rest of the filters.
> Recursion should pass in the numRows - branch1Rows for the branch-2."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-07-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646870#comment-14646870
 ] 

Pengcheng Xiong commented on HIVE-11387:


[~jcamachorodriguez], could you please review this patch? It involves some plan 
changes for PTF operator.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch, HIVE-11387.04.patch
>
>
> {noformat}
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-07-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11387:
---
Attachment: HIVE-11387.04.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch, HIVE-11387.04.patch
>
>
> {noformat}
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-07-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11387:
---
Description: The main problem is that, due to return path, now we may have 
(RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main problem 
is that it does not take into account of the setting.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch
>
>
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-07-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11387:
---
Description: 
{noformat}
The main problem is that, due to return path, now we may have 
(RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main problem 
is that it does not take into account of the setting.
{noformat}

  was:The main problem is that, due to return path, now we may have 
(RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main problem 
is that it does not take into account of the setting.


> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
> reduce_deduplicate optimization
> --
>
> Key: HIVE-11387
> URL: https://issues.apache.org/jira/browse/HIVE-11387
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
> HIVE-11387.03.patch
>
>
> {noformat}
> The main problem is that, due to return path, now we may have 
> (RS1-GBY2)-(RS3-GBY4) when map.aggr=false, i.e., no map aggr. However, in the 
> non-return path, it will be treated as (RS1)-(GBY2-RS3-GBY4). The main 
> problem is that it does not take into account of the setting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11405:
-
Assignee: Mostafa Mokhtar  (was: Prasanth Jayachandran)

> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
> --
>
> Key: HIVE-11405
> URL: https://issues.apache.org/jira/browse/HIVE-11405
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Mostafa Mokhtar
>
> Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
> him,
> "The recursion protection works well with an AND expr, but it doesn't work 
> against
> (OR a=1 (OR a=2 (OR a=3 (OR ...)
> since the for the rows will never be reduced during recursion due to the 
> nature of the OR.
> We need to execute a short-circuit to satisfy the OR properly - no case which 
> matches a=1 qualifies for the rest of the filters.
> Recursion should pass in the numRows - branch1Rows for the branch-2."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression

2015-07-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11330:
-
Assignee: Mostafa Mokhtar  (was: Prasanth Jayachandran)

> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression
> ---
>
> Key: HIVE-11330
> URL: https://issues.apache.org/jira/browse/HIVE-11330
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11330.patch
>
>
> Queries with heavily nested filters can cause a StackOverflowError
> {code}
> Exception in thread "main" java.lang.StackOverflowError
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646865#comment-14646865
 ] 

Sergey Shelukhin commented on HIVE-11402:
-

HIVE-4239 only allows parallel compilation between sessions

> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11405:
-
Description: 
Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
him,
"The recursion protection works well with an AND expr, but it doesn't work 
against
(OR a=1 (OR a=2 (OR a=3 (OR ...)
since the for the rows will never be reduced during recursion due to the nature 
of the OR.
We need to execute a short-circuit to satisfy the OR properly - no case which 
matches a=1 qualifies for the rest of the filters.
Recursion should pass in the numRows - branch1Rows for the branch-2."

  was:
Thanks for [~gopalv] for uncovering this as part of HIVE-11330.  Quoting him,
"The recursion protection works well with an AND expr, but it doesn't work 
against
(OR a=1 (OR a=2 (OR a=3 (OR ...)
since the for the rows will never be reduced during recursion due to the nature 
of the OR.
We need to execute a short-circuit to satisfy the OR properly - no case which 
matches a=1 qualifies for the rest of the filters.
Recursion should pass in the numRows - branch1Rows for the branch-2."


> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
> --
>
> Key: HIVE-11405
> URL: https://issues.apache.org/jira/browse/HIVE-11405
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Prasanth Jayachandran
>
> Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
> him,
> "The recursion protection works well with an AND expr, but it doesn't work 
> against
> (OR a=1 (OR a=2 (OR a=3 (OR ...)
> since the for the rows will never be reduced during recursion due to the 
> nature of the OR.
> We need to execute a short-circuit to satisfy the OR properly - no case which 
> matches a=1 qualifies for the rest of the filters.
> Recursion should pass in the numRows - branch1Rows for the branch-2."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646863#comment-14646863
 ] 

Gopal V commented on HIVE-11405:


thanks [~hsubramaniyan], I'm currently bypassing that with a temporary band-aid 
which needs attention for correctness

{code}
--- 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
@@ -325,9 +325,16 @@ private long evaluateExpression(Statistics stats, 
ExprNodeDesc pred,
   }
 } else if (udf instanceof GenericUDFOPOr) {
   // for OR condition independently compute and update stats
-  for (ExprNodeDesc child : genFunc.getChildren()) {
-newNumRows = StatsUtils.safeAdd(
-evaluateChildExpr(stats, child, aspCtx, neededCols, fop), 
newNumRows);
+  newNumRows = stats.getNumRows();
+  Statistics orStats = stats.clone();
+  int k = 0;
+ 
+  for (ExprNodeDesc child : 
com.google.common.collect.Lists.reverse(genFunc.getChildren())) {
+final long branchRows = evaluateChildExpr(orStats, child, aspCtx, 
neededCols, fop);
+final long branch2Rows = (newNumRows <= branchRows) ? 0 : 
(newNumRows - branchRows);
+updateStats(orStats, branch2Rows, true, fop);
+newNumRows = StatsUtils.safeAdd(branchRows, newNumRows);
   }
 } else if (udf instanceof GenericUDFOPNot) {
   newNumRows = evaluateNotExpr(stats, pred, aspCtx, neededCols, fop);
{code}

> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
> --
>
> Key: HIVE-11405
> URL: https://issues.apache.org/jira/browse/HIVE-11405
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Prasanth Jayachandran
>
> Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
> him,
> "The recursion protection works well with an AND expr, but it doesn't work 
> against
> (OR a=1 (OR a=2 (OR a=3 (OR ...)
> since the for the rows will never be reduced during recursion due to the 
> nature of the OR.
> We need to execute a short-circuit to satisfy the OR properly - no case which 
> matches a=1 qualifies for the rest of the filters.
> Recursion should pass in the numRows - branch1Rows for the branch-2."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11404) branch-1 does not compile

2015-07-29 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646842#comment-14646842
 ] 

Szehon Ho commented on HIVE-11404:
--

Sorry that was my fault.  Thanks for fixing this.

> branch-1 does not compile
> -
>
> Key: HIVE-11404
> URL: https://issues.apache.org/jira/browse/HIVE-11404
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 1.3.0
>
>
> {noformat}
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java:[216,28]
>  cannot find symbol
> [ERROR] symbol:   class UnionOperator
> [ERROR] location: class 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx
> {noformat}
> Looks like HIVE-11271 broke this. Missing an import of UnionOperator, which 
> was added to ColumnPrunerProcCtx.java as part of HIVE-11333, however that 
> Jira has not yet been added to branch-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646833#comment-14646833
 ] 

Eugene Koifman commented on HIVE-11214:
---

[~mmccline], why would this not be in branch-1?

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11185) Fix compustat_avro.q/load_dyn_part14_win.q for Windows

2015-07-29 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11185:
--
Attachment: HIVE-11185.2.patch

re-uploading patch to kick off tests

> Fix compustat_avro.q/load_dyn_part14_win.q for Windows
> --
>
> Key: HIVE-11185
> URL: https://issues.apache.org/jira/browse/HIVE-11185
> Project: Hive
>  Issue Type: Bug
>  Components: Tests, Windows
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11185.1.patch, HIVE-11185.2.patch
>
>
> compustat_avro.q: The way the location file path was being specified using 
> system:hive.root wasn't working on Windows. create_like.q has an example of 
> how to get this to work, using system:test.tmp.dir
> load_dyn_part14_win.q: Looks like HIVE-9039 changed the Hive syntax, and the 
> original load_dyn_part14.q had changed but the Windows-specific version 
> load_dyn_part14_win.q had not been updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11185) Fix compustat_avro.q/load_dyn_part14_win.q for Windows

2015-07-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646824#comment-14646824
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11185:
--

+1, pending tests

> Fix compustat_avro.q/load_dyn_part14_win.q for Windows
> --
>
> Key: HIVE-11185
> URL: https://issues.apache.org/jira/browse/HIVE-11185
> Project: Hive
>  Issue Type: Bug
>  Components: Tests, Windows
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11185.1.patch
>
>
> compustat_avro.q: The way the location file path was being specified using 
> system:hive.root wasn't working on Windows. create_like.q has an example of 
> how to get this to work, using system:test.tmp.dir
> load_dyn_part14_win.q: Looks like HIVE-9039 changed the Hive syntax, and the 
> original load_dyn_part14.q had changed but the Windows-specific version 
> load_dyn_part14_win.q had not been updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646822#comment-14646822
 ] 

Hive QA commented on HIVE-11306:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747642/HIVE-11306.3.patch

{color:green}SUCCESS:{color} +1 9274 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4748/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4748/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4748/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747642 - PreCommit-HIVE-TRUNK-Build

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline resolved HIVE-11214.
-
Resolution: Fixed

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646821#comment-14646821
 ] 

Matt McCline commented on HIVE-11214:
-

Committed to trunk.

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11404) branch-1 does not compile

2015-07-29 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere resolved HIVE-11404.
---
   Resolution: Fixed
Fix Version/s: 1.3.0

I've fixed this by adding the missing import. I've committed this to branch-1.

{noformat}
--- a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java
@@ -30,6 +30,7 @@
 import org.apache.hadoop.hive.ql.exec.OperatorFactory;
 import org.apache.hadoop.hive.ql.exec.RowSchema;
 import org.apache.hadoop.hive.ql.exec.SelectOperator;
+import org.apache.hadoop.hive.ql.exec.UnionOperator;
 import org.apache.hadoop.hive.ql.exec.Utilities;
 import org.apache.hadoop.hive.ql.lib.NodeProcessorCtx;
 import org.apache.hadoop.hive.ql.parse.ParseContext;
{noformat}

> branch-1 does not compile
> -
>
> Key: HIVE-11404
> URL: https://issues.apache.org/jira/browse/HIVE-11404
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 1.3.0
>
>
> {noformat}
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java:[216,28]
>  cannot find symbol
> [ERROR] symbol:   class UnionOperator
> [ERROR] location: class 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx
> {noformat}
> Looks like HIVE-11271 broke this. Missing an import of UnionOperator, which 
> was added to ColumnPrunerProcCtx.java as part of HIVE-11333, however that 
> Jira has not yet been added to branch-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11257) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to support MultiJoin operators

2015-07-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646806#comment-14646806
 ] 

Pengcheng Xiong commented on HIVE-11257:


+1

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method 
> isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to 
> support MultiJoin operators merge
> -
>
> Key: HIVE-11257
> URL: https://issues.apache.org/jira/browse/HIVE-11257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11257.01.patch, HIVE-11257.02.patch, 
> HIVE-11257.03.patch, HIVE-11257.04.patch, HIVE-11257.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11333) ColumnPruner prunes columns of UnionOperator that should be kept

2015-07-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646797#comment-14646797
 ] 

Jason Dere commented on HIVE-11333:
---

Notice this was not added to branch-1, should it also go there?

> ColumnPruner prunes columns of UnionOperator that should be kept
> 
>
> Key: HIVE-11333
> URL: https://issues.apache.org/jira/browse/HIVE-11333
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-11333.01.patch, HIVE-11333.02.patch
>
>
> unionOperator will have the schema following the operator in the first 
> branch. Because ColumnPruner prunes columns based on the internal name, the 
> column in other branches may be pruned due to a different internal name from 
> the first branch. To repro, run rcfile_union.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11383:
---
Attachment: HIVE-11383.6.patch

> Upgrade Hive to Calcite 1.4
> ---
>
> Key: HIVE-11383
> URL: https://issues.apache.org/jira/browse/HIVE-11383
> Project: Hive
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, 
> HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
> HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch
>
>
> CLEAR LIBRARY CACHE
> Upgrade Hive to Calcite 1.4.0-incubating.
> There is currently a snapshot release, which is close to what will be in 1.4. 
> I have checked that Hive compiles against the new snapshot, fixing one issue. 
> The patch is attached.
> Next step is to validate that Hive runs against the new Calcite, and post any 
> issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
> can you please do that.
> [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
> the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11214:

Fix Version/s: (was: 1.2.2)

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11386) Improve Vectorized GROUP BY Performance (Phase 1)

2015-07-29 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11386:

Attachment: HIVE-11386.02.patch

> Improve Vectorized GROUP BY Performance (Phase 1)
> -
>
> Key: HIVE-11386
> URL: https://issues.apache.org/jira/browse/HIVE-11386
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11386.01.patch, HIVE-11386.02.patch
>
>
> Improve vectorized GROUP BY performance, with an eye towards the new LLAP 
> memory management (dramatically reduce the number of Java object, allocate 
> very large objects, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11386) Improve Vectorized GROUP BY Performance (Phase 1)

2015-07-29 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646792#comment-14646792
 ] 

Matt McCline commented on HIVE-11386:
-

Fixed vector_orderby_5.q problem.

Added some code to base size on memory.  Not finished.

> Improve Vectorized GROUP BY Performance (Phase 1)
> -
>
> Key: HIVE-11386
> URL: https://issues.apache.org/jira/browse/HIVE-11386
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11386.01.patch, HIVE-11386.02.patch
>
>
> Improve vectorized GROUP BY performance, with an eye towards the new LLAP 
> memory management (dramatically reduce the number of Java object, allocate 
> very large objects, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11398) Parse wide OR and wide AND trees as a flat ANY/ALL list

2015-07-29 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-11398:
--
Assignee: Jesus Camacho Rodriguez

> Parse wide OR and wide AND trees as a flat ANY/ALL list
> ---
>
> Key: HIVE-11398
> URL: https://issues.apache.org/jira/browse/HIVE-11398
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, UDF
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>
> Deep trees of AND/OR are hard to traverse particularly when they are merely 
> the same structure in nested form as a version of the operator that takes an 
> arbitrary number of args.
> One potential way to convert the DFS searches into a simpler BFS search is to 
> introduce a new Operator pair named ALL and ANY.
> ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
> ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
> The SemanticAnalyser would be responsible for generating these operators and 
> this would mean that the depth and complexity of traversals for the simplest 
> case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees as a flat ANY/ALL list

2015-07-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646788#comment-14646788
 ] 

Gunther Hagleitner commented on HIVE-11398:
---

Sorry - meant AND and OR should take multiple children.

> Parse wide OR and wide AND trees as a flat ANY/ALL list
> ---
>
> Key: HIVE-11398
> URL: https://issues.apache.org/jira/browse/HIVE-11398
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, UDF
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>
> Deep trees of AND/OR are hard to traverse particularly when they are merely 
> the same structure in nested form as a version of the operator that takes an 
> arbitrary number of args.
> One potential way to convert the DFS searches into a simpler BFS search is to 
> introduce a new Operator pair named ALL and ANY.
> ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
> ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
> The SemanticAnalyser would be responsible for generating these operators and 
> this would mean that the depth and complexity of traversals for the simplest 
> case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees as a flat ANY/ALL list

2015-07-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646787#comment-14646787
 ] 

Gunther Hagleitner commented on HIVE-11398:
---

Aren't any/all already supported? Shouldn't AND and ALL take any number of 
children? Would it just help to flatten that as much as possible?

cc [~jcamachorodriguez]/[~hsubramaniyan]/[~mmokhtar]

> Parse wide OR and wide AND trees as a flat ANY/ALL list
> ---
>
> Key: HIVE-11398
> URL: https://issues.apache.org/jira/browse/HIVE-11398
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, UDF
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>
> Deep trees of AND/OR are hard to traverse particularly when they are merely 
> the same structure in nested form as a version of the operator that takes an 
> arbitrary number of args.
> One potential way to convert the DFS searches into a simpler BFS search is to 
> introduce a new Operator pair named ALL and ANY.
> ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
> ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
> The SemanticAnalyser would be responsible for generating these operators and 
> this would mean that the depth and complexity of traversals for the simplest 
> case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646785#comment-14646785
 ] 

Gunther Hagleitner commented on HIVE-11397:
---

[~jcamachorodriguez]/[~hsubramaniyan]/[~mmokhtar] thoughts? This should be a 
legal transform and will keep the depth of the tree in check (avoid OO stack 
space for recursive algos.


> Parse Hive OR clauses as they are written into the AST
> --
>
> Key: HIVE-11397
> URL: https://issues.apache.org/jira/browse/HIVE-11397
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>
> When parsing A OR B OR C, hive converts it into 
> (C OR B) OR A
> instead of turning it into
> A OR (B OR C)
> {code}
> GenericUDFOPOr or = new GenericUDFOPOr();
> List expressions = new ArrayList(2);
> expressions.add(previous);
> expressions.add(current);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11296) Merge from master to spark branch [Spark Branch]

2015-07-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646781#comment-14646781
 ] 

Xuefu Zhang commented on HIVE-11296:


+1

> Merge from master to spark branch [Spark Branch]
> 
>
> Key: HIVE-11296
> URL: https://issues.apache.org/jira/browse/HIVE-11296
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-11296.1-spark.patch, HIVE-11296.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-29 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-11397:
--
Assignee: Jesus Camacho Rodriguez

> Parse Hive OR clauses as they are written into the AST
> --
>
> Key: HIVE-11397
> URL: https://issues.apache.org/jira/browse/HIVE-11397
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>
> When parsing A OR B OR C, hive converts it into 
> (C OR B) OR A
> instead of turning it into
> A OR (B OR C)
> {code}
> GenericUDFOPOr or = new GenericUDFOPOr();
> List expressions = new ArrayList(2);
> expressions.add(previous);
> expressions.add(current);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11214:

Fix Version/s: 1.2.2
   2.0.0

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.0.0, 1.2.2
>
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11214) Insert into ACID table switches vectorization off

2015-07-29 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11214:

Attachment: HIVE-11214.04.patch

> Insert into ACID table switches vectorization off 
> --
>
> Key: HIVE-11214
> URL: https://issues.apache.org/jira/browse/HIVE-11214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11214.01.patch, HIVE-11214.02.patch, 
> HIVE-11214.03.patch, HIVE-11214.04.patch
>
>
> PROBLEM:
> vectorization is switched off automatically after run insert into ACID table.
> STEPS TO REPRODUCE:
> set hive.vectorized.execution.enabled=true;
> create table testv (id int, name string) clustered by (id) into 2 buckets 
> stored as orc tblproperties("transactional"="true");
> insert into testv values(1,'a');
> set hive.vectorized.execution.enabled;
> false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646712#comment-14646712
 ] 

Hive QA commented on HIVE-10319:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746748/HIVE-10319.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9259 tests executed
*Failed tests:*
{noformat}
TestCliDriver-auto_join30.q-input10.q-join_reorder3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4747/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4747/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4747/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746748 - PreCommit-HIVE-TRUNK-Build

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive

2015-07-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646696#comment-14646696
 ] 

Eugene Koifman commented on HIVE-11008:
---

[~thejas] is this still relevant?

> webhcat GET /jobs retries on getting job details from history server is too 
> agressive
> -
>
> Key: HIVE-11008
> URL: https://issues.apache.org/jira/browse/HIVE-11008
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.2.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-11008.1.patch
>
>
> Webhcat "jobs" api gets the list of jobs from RM and then gets details from 
> history server.
> RM has a policy of retaining fixed number of jobs to accommodate for the 
> memory it has, while HistoryServer retains jobs based on their age. As a 
> result, jobs that RM returns might not be present in HistoryServer and can 
> result in a failure. HistoryServer also ends up retrying on failures even if 
> they happen because the job actually does not exist. 
> The retries to get details from HistoryServer in such cases is too aggressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646695#comment-14646695
 ] 

Thejas M Nair commented on HIVE-11402:
--

>From [~ekoifman] - Each SessionState has 1 instance of HiveTxnManager which 
>can only handle 1 txn at a time. 
While JDBC spec doesn't make concurrency guarantees explicit, most info on the 
web says it's a bad idea to allow this.
For example, suppose you setAutoCommit(false) and then run to statements 
concurrently on this Connection, would you expect 1 transaction for each 
statement or 1 txn for both.


> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646693#comment-14646693
 ] 

Thejas M Nair commented on HIVE-11402:
--

Comment from earlier discussion with [~vgumashta] and  [~jdere] - In Hive 
SessionState since there are several fields that are associated with the 
currently executing query, such as commandType, lastCommand. Also SessionState 
has a Conf object and this also gets set with query-specific information.

> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11286) insert values clause should support functions

2015-07-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11286:
-

Assignee: Eugene Koifman

> insert values clause should support functions
> -
>
> Key: HIVE-11286
> URL: https://issues.apache.org/jira/browse/HIVE-11286
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> insert into T values(1,2) is supported
> but 
> insert into T values(1,current_date()) is not - this would be useful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11345) Fix formatting of Show Compations/Transactions/Locks

2015-07-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11345:
-

Assignee: Eugene Koifman

> Fix formatting of Show Compations/Transactions/Locks
> 
>
> Key: HIVE-11345
> URL: https://issues.apache.org/jira/browse/HIVE-11345
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> all the columns of the output are variable length (in each row, based on 
> data) - makes it really difficult to read



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646690#comment-14646690
 ] 

Thejas M Nair commented on HIVE-4239:
-

Created HIVE-11402 - HS2 - disallow parallel query execution within a single 
Session


> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11402:
-
Description: 
HiveServer2 currently allows concurrent queries to be run in a single session. 
However, every HS2 session has  an associated SessionState object, and the use 
of SessionState in many places assumes that only one thread is using it, ie it 
is not thread safe.
There are many places where SesssionState thread safety needs to be addressed, 
and until then we should serialize all query execution for a single HS2 
session. This problem can become more visible with HIVE-4239 now allowing 
parallel query compilation.

Note that running queries in parallel for single session is not straightforward 
 with jdbc, you need to spawn another thread as the Statement.execute calls are 
blocking. I believe ODBC has non blocking query execution API, and Hue is 
another well known application that shares sessions for all queries that a user 
runs.


  was:
HiveServer2 currently allows concurrent queries to be run in a single session. 
However, every HS2 session has  an associated SessionState object, and the use 
of SessionState in many places assumes that only one thread is using it, ie it 
is not thread safe.
There are many places where SesssionState thread safety needs to be addressed, 
and until then we should serialize all query execution for a single HS2 
session. 

Note that running queries in parallel for single session is not straightforward 
 with jdbc, you need to spawn another thread as the Statement.execute calls are 
blocking. I believe ODBC has non blocking query execution API, and Hue is 
another well known application that shares sessions for all queries that a user 
runs.



> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1464#comment-1464
 ] 

Hive QA commented on HIVE-11363:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747832/HIVE-11363.5-spark.patch

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 6298 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestHBaseCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestSparkCliDriver.initializationError
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_count_distinct
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/944/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/944/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-944/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747832 - PreCommit-HIVE-SPARK-Build

> Prewarm Hive on Spark containers [Spark Branch]
> ---
>
> Key: HIVE-11363
> URL: https://issues.apache.org/jira/browse/HIVE-11363
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, 
> HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch
>
>
> When Hive job is launched by Oozie, a Hive session is created and job script 
> is executed. Session is closed when Hive job is completed. Thus, Hive session 
> is not shared among Hive jobs either in an Oozie workflow or across 
> workflows. Since the parallelism of a Hive job executed on Spark is impacted 
> by the available executors, such Hive jobs will suffer the executor ramp-up 
> overhead. The idea here is to wait a bit so that enough executors can come up 
> before a job can be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11257) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to support MultiJoin operators me

2015-07-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11257:
---
Attachment: HIVE-11257.04.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method 
> isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to 
> support MultiJoin operators merge
> -
>
> Key: HIVE-11257
> URL: https://issues.apache.org/jira/browse/HIVE-11257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11257.01.patch, HIVE-11257.02.patch, 
> HIVE-11257.03.patch, HIVE-11257.04.patch, HIVE-11257.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()

2015-07-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11316:
-
Attachment: HIVE-11316.7.patch

> Use datastructure that doesnt duplicate any part of string for 
> ASTNode::toStringTree()
> --
>
> Key: HIVE-11316
> URL: https://issues.apache.org/jira/browse/HIVE-11316
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11316-branch-1.0.patch, 
> HIVE-11316-branch-1.2.patch, HIVE-11316.1.patch, HIVE-11316.2.patch, 
> HIVE-11316.3.patch, HIVE-11316.4.patch, HIVE-11316.5.patch, 
> HIVE-11316.6.patch, HIVE-11316.7.patch
>
>
> HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
> is suppose to alter the string memoization to use a different data structure 
> that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11296) Merge from master to spark branch [Spark Branch]

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646607#comment-14646607
 ] 

Hive QA commented on HIVE-11296:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747818/HIVE-11296.2-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7689 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/943/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/943/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-943/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747818 - PreCommit-HIVE-SPARK-Build

> Merge from master to spark branch [Spark Branch]
> 
>
> Key: HIVE-11296
> URL: https://issues.apache.org/jira/browse/HIVE-11296
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-11296.1-spark.patch, HIVE-11296.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]

2015-07-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646591#comment-14646591
 ] 

Xuefu Zhang commented on HIVE-11363:


Patch #5 reuses existing configurations instead.

> Prewarm Hive on Spark containers [Spark Branch]
> ---
>
> Key: HIVE-11363
> URL: https://issues.apache.org/jira/browse/HIVE-11363
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, 
> HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch
>
>
> When Hive job is launched by Oozie, a Hive session is created and job script 
> is executed. Session is closed when Hive job is completed. Thus, Hive session 
> is not shared among Hive jobs either in an Oozie workflow or across 
> workflows. Since the parallelism of a Hive job executed on Spark is impacted 
> by the available executors, such Hive jobs will suffer the executor ramp-up 
> overhead. The idea here is to wait a bit so that enough executors can come up 
> before a job can be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]

2015-07-29 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-11363:
---
Attachment: HIVE-11363.5-spark.patch

> Prewarm Hive on Spark containers [Spark Branch]
> ---
>
> Key: HIVE-11363
> URL: https://issues.apache.org/jira/browse/HIVE-11363
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, 
> HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch, HIVE-11363.5-spark.patch
>
>
> When Hive job is launched by Oozie, a Hive session is created and job script 
> is executed. Session is closed when Hive job is completed. Thus, Hive session 
> is not shared among Hive jobs either in an Oozie workflow or across 
> workflows. Since the parallelism of a Hive job executed on Spark is impacted 
> by the available executors, such Hive jobs will suffer the executor ramp-up 
> overhead. The idea here is to wait a bit so that enough executors can come up 
> before a job can be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-07-29 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11306:
-
Attachment: (was: HIVE-11306.4.patch)

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11363) Prewarm Hive on Spark containers [Spark Branch]

2015-07-29 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646536#comment-14646536
 ] 

Chao Sun commented on HIVE-11363:
-

LGTM +1.
One suggestion: maybe we should consider reusing existing config: 
"hive.prewarm.enabled" and "hive.prewarm.numcontainers" for Spark.

> Prewarm Hive on Spark containers [Spark Branch]
> ---
>
> Key: HIVE-11363
> URL: https://issues.apache.org/jira/browse/HIVE-11363
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-11363.1-spark.patch, HIVE-11363.2-spark.patch, 
> HIVE-11363.3-spark.patch, HIVE-11363.4-spark.patch
>
>
> When Hive job is launched by Oozie, a Hive session is created and job script 
> is executed. Session is closed when Hive job is completed. Thus, Hive session 
> is not shared among Hive jobs either in an Oozie workflow or across 
> workflows. Since the parallelism of a Hive job executed on Spark is impacted 
> by the available executors, such Hive jobs will suffer the executor ramp-up 
> overhead. The idea here is to wait a bit so that enough executors can come up 
> before a job can be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646526#comment-14646526
 ] 

Jason Dere commented on HIVE-10319:
---

I think this looks good - can you mark this Jira as patch available so we can 
see the precommit test results?

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11296) Merge from master to spark branch [Spark Branch]

2015-07-29 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-11296:

Attachment: HIVE-11296.2-spark.patch

> Merge from master to spark branch [Spark Branch]
> 
>
> Key: HIVE-11296
> URL: https://issues.apache.org/jira/browse/HIVE-11296
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-11296.1-spark.patch, HIVE-11296.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-07-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646429#comment-14646429
 ] 

Sushanth Sowmyan commented on HIVE-10165:
-

I think the examples are good and on-point as a guideline to new users - thank 
you for finding them. :) I don't think any further emphasis is needed. Also, 
thank you for the bit on the "fix version" setting clarification there. That's 
something that pops up often.

> Improve hive-hcatalog-streaming extensibility and support updates and deletes.
> --
>
> Key: HIVE-10165
> URL: https://issues.apache.org/jira/browse/HIVE-10165
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: TODOC2.0, streaming_api
> Fix For: 2.0.0
>
> Attachments: HIVE-10165.0.patch, HIVE-10165.10.patch, 
> HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, 
> HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png
>
>
> h3. Overview
> I'd like to extend the 
> [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
>  API so that it also supports the writing of record updates and deletes in 
> addition to the already supported inserts.
> h3. Motivation
> We have many Hadoop processes outside of Hive that merge changed facts into 
> existing datasets. Traditionally we achieve this by: reading in a 
> ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
> sequence and then applying a function to determine inserted, updated, and 
> deleted rows. However, in our current scheme we must rewrite all partitions 
> that may potentially contain changes. In practice the number of mutated 
> records is very small when compared with the records contained in a 
> partition. This approach results in a number of operational issues:
> * Excessive amount of write activity required for small data changes.
> * Downstream applications cannot robustly read these datasets while they are 
> being updated.
> * Due to scale of the updates (hundreds or partitions) the scope for 
> contention is high. 
> I believe we can address this problem by instead writing only the changed 
> records to a Hive transactional table. This should drastically reduce the 
> amount of data that we need to write and also provide a means for managing 
> concurrent access to the data. Our existing merge processes can read and 
> retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
> an updated form of the hive-hcatalog-streaming API which will then have the 
> required data to perform an update or insert in a transactional manner. 
> h3. Benefits
> * Enables the creation of large-scale dataset merge processes  
> * Opens up Hive transactional functionality in an accessible manner to 
> processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-07-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11391:
---
Attachment: HIVE-11391.patch

Triggering QA run when HIVE-11257 goes in

> CBO (Calcite Return Path): Add CBO tests with return path on
> 
>
> Key: HIVE-11391
> URL: https://issues.apache.org/jira/browse/HIVE-11391
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11391.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11329) Column prefix in key of hbase column prefix map

2015-07-29 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646264#comment-14646264
 ] 

Swarnim Kulkarni commented on HIVE-11329:
-

Thanks [~woj_in]. Would you mind opening a review board request for me?

> Column prefix in key of hbase column prefix map
> ---
>
> Key: HIVE-11329
> URL: https://issues.apache.org/jira/browse/HIVE-11329
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Wojciech Indyk
>Assignee: Wojciech Indyk
>Priority: Minor
> Attachments: HIVE-11329.1.patch
>
>
> When I create a table with hbase column prefix 
> https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result 
> map in hive. 
> E.g. record in HBase
> rowkey: 123
> column: tag_one, value: 0.5
> column: tag_two, value 0.5
> representation in Hive via column prefix mapping "tag_.*":
> column: tag map
> key: tag_one, value: 0.5
> key: tag_two, value: 0.5
> should be:
> key: one, value: 0.5
> key: two: value: 0.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11257) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to support MultiJoin operators

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646231#comment-14646231
 ] 

Hive QA commented on HIVE-11257:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747775/HIVE-11257.03.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9273 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4746/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4746/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4746/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747775 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method 
> isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to 
> support MultiJoin operators merge
> -
>
> Key: HIVE-11257
> URL: https://issues.apache.org/jira/browse/HIVE-11257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11257.01.patch, HIVE-11257.02.patch, 
> HIVE-11257.03.patch, HIVE-11257.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11354) HPL/SQL extending compatibility with Transact-SQL

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646027#comment-14646027
 ] 

Hive QA commented on HIVE-11354:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747772/HIVE-11354.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9277 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4745/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4745/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4745/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747772 - PreCommit-HIVE-TRUNK-Build

> HPL/SQL extending compatibility with Transact-SQL
> -
>
> Key: HIVE-11354
> URL: https://issues.apache.org/jira/browse/HIVE-11354
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Attachments: HIVE-11354.1.patch
>
>
> Although HPL/SQL already supports some Transact-SQL language elements 
> (declarations, flow-of-control stmts, assignments and so on) some other 
> widely used constructs are not supported yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11392) Trailing spaces in char comparisons

2015-07-29 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646012#comment-14646012
 ] 

Aihua Xu commented on HIVE-11392:
-

Got it. Thanks Jason.

> Trailing spaces in char comparisons
> ---
>
> Key: HIVE-11392
> URL: https://issues.apache.org/jira/browse/HIVE-11392
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>
> Following on HIVE-3745, for char type, hive should ignore trailing spaces for 
> comparison while it seems not the case. 
> {noformat}
> create table chtest (a char(4));
> insert into chtest values ('1');
> select * from chtest where a='1'; # no whitespace, produces result 
> select * from chtest where a='1  '; # 2 spaces, no result
> select * from chtest where a='1  '; # 3 spaces, no result 
> select * from chtest where a=cast('1 ' as char(4)); # any amount of spaces, 
> cast to char of same length, produces result
> {noformat}
> It's not consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11257) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to support MultiJoin operators me

2015-07-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11257:
---
Attachment: HIVE-11257.03.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method 
> isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to 
> support MultiJoin operators merge
> -
>
> Key: HIVE-11257
> URL: https://issues.apache.org/jira/browse/HIVE-11257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11257.01.patch, HIVE-11257.02.patch, 
> HIVE-11257.03.patch, HIVE-11257.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11257) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to support MultiJoin operators me

2015-07-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11257:
---
Attachment: HIVE-11257.02.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Method 
> isCombinablePredicate in HiveJoinToMultiJoinRule should be extended to 
> support MultiJoin operators merge
> -
>
> Key: HIVE-11257
> URL: https://issues.apache.org/jira/browse/HIVE-11257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11257.01.patch, HIVE-11257.02.patch, 
> HIVE-11257.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)

2015-07-29 Thread Dmitry Tolpeko (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645897#comment-14645897
 ] 

Dmitry Tolpeko commented on HIVE-11055:
---

The problem is fixed by HIVE-11354 patch.

> HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
> ---
>
> Key: HIVE-11055
> URL: https://issues.apache.org/jira/browse/HIVE-11055
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Fix For: 2.0.0
>
> Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
> HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml
>
>
> There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
> (actually any SQL-on-Hadoop implementation and any JDBC source).
> Alan Gates offered to contribute it to Hive under HPL/SQL name 
> (org.apache.hive.hplsql package). This JIRA is to create a patch to 
> contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11354) HPL/SQL extending compatibility with Transact-SQL

2015-07-29 Thread Dmitry Tolpeko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Tolpeko updated HIVE-11354:
--
Attachment: HIVE-11354.1.patch

Patch with tests created.

> HPL/SQL extending compatibility with Transact-SQL
> -
>
> Key: HIVE-11354
> URL: https://issues.apache.org/jira/browse/HIVE-11354
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Attachments: HIVE-11354.1.patch
>
>
> Although HPL/SQL already supports some Transact-SQL language elements 
> (declarations, flow-of-control stmts, assignments and so on) some other 
> widely used constructs are not supported yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333

2015-07-29 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645814#comment-14645814
 ] 

Yongzhi Chen commented on HIVE-11384:
-

[~szehon], sorry, I missed your comments. I added two more tests. They are on 
the last part of the patch. You will find the filter of the query include both 
filter and f1(all the columns in the subquery). These two test cases need both 
jira solved to pass. 
I added explain for the simple case is just for comparing with later more 
complicated one. Thanks

> Add Test case which cover both HIVE-11271 and HIVE-11333
> 
>
> Key: HIVE-11384
> URL: https://issues.apache.org/jira/browse/HIVE-11384
> Project: Hive
>  Issue Type: Test
>  Components: Logical Optimizer, Parser
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11384.1.patch
>
>
> Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to 
> pass. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-07-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645798#comment-14645798
 ] 

Hive QA commented on HIVE-11304:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747721/HIVE-11304.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9277 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testNegativeCliDriver_case_with_row_sequence
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4744/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4744/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4744/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747721 - PreCommit-HIVE-TRUNK-Build

> Migrate to Log4j2 from Log4j 1.x
> 
>
> Key: HIVE-11304
> URL: https://issues.apache.org/jira/browse/HIVE-11304
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.patch
>
>
> Log4J2 has some great benefits and can benefit hive significantly. Some 
> notable features include
> 1) Performance (parametrized logging, performance when logging is disabled 
> etc.) More details can be found here 
> https://logging.apache.org/log4j/2.x/performance.html
> 2) RoutingAppender - Route logs to different log files based on MDC context 
> (useful for HS2, LLAP etc.)
> 3) Asynchronous logging
> This is an umbrella jira to track changes related to Log4j2 migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11400) insert overwrite task awasy stuck at latest job

2015-07-29 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated HIVE-11400:
-
Description: 
when i run a task like "insert overwrite table a (select *  from b join select 
* from c  on b.id=c.id) tmp;" it will get stuck on latest job.(eg. the parser 
explain the task has 3 jobs,but the third job(or stage) will never get 
executed).
there have two files:
1.hql explain file.
2.running logs.
you will see the stage-0 in explain file is Move Operation,but you will not see 
it in the running logs.and the fact is 16 of 17 jobs has complete(actually the 
13th job get lost?i dont see anywhere in the logs),but the 17th job get hanged 
forever.and even it not bean assigned a jobid and launched!

there are someone can help this?
Thanks for you very much~!

  was:
when i run a task like "insert overwrite table a (select *  from b join select 
* from c  on b.id=c.id) tmp;" it will get stuck on latest job.(eg. the parser 
explain the task has 3 jobs,but the third job(or stage) will never get 
executed).
there have two files:
1.hql explain file.
2.running logs.
you will see the stage-0 in explain file is Move Operation,but you will not see 
it in the running logs.and the fact is 16 of 17 jobs has complete(actually the 
13th job get lost?i dont see anywhere in the logs),but the 17th job get hanged 
forever.

there are someone can help this?
Thanks for you very much~!


> insert overwrite task awasy stuck at latest job
> ---
>
> Key: HIVE-11400
> URL: https://issues.apache.org/jira/browse/HIVE-11400
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor
>Affects Versions: 0.14.0
> Environment: hadoop 2.6.0,centos 6.5
>Reporter: Feng Yuan
> Attachments: failed_logs, success_logs, task_explain
>
>
> when i run a task like "insert overwrite table a (select *  from b join 
> select * from c  on b.id=c.id) tmp;" it will get stuck on latest job.(eg. the 
> parser explain the task has 3 jobs,but the third job(or stage) will never get 
> executed).
> there have two files:
> 1.hql explain file.
> 2.running logs.
> you will see the stage-0 in explain file is Move Operation,but you will not 
> see it in the running logs.and the fact is 16 of 17 jobs has 
> complete(actually the 13th job get lost?i dont see anywhere in the logs),but 
> the 17th job get hanged forever.and even it not bean assigned a jobid and 
> launched!
> there are someone can help this?
> Thanks for you very much~!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11400) insert overwrite task awasy stuck at latest job

2015-07-29 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated HIVE-11400:
-
Description: 
when i run a task like "insert overwrite table a (select *  from b join select 
* from c  on b.id=c.id) tmp;" it will get stuck on latest job.(eg. the parser 
explain the task has 3 jobs,but the third job(or stage) will never get 
executed).
there have two files:
1.hql explain file.
2.running logs.
you will see the stage-0 in explain file is Move Operation,but you will not see 
it in the running logs.and the fact is 16 of 17 jobs has complete(actually the 
13th job get lost?i dont see anywhere in the logs),but the 17th job get hanged 
forever.

there are someone can help this?
Thanks for you very much~!

  was:
when i run a task like "insert overwrite table a (select *  from b join select 
* from c  on b.id=c.id) tmp;" it will get stuck on latest stage.(eg. the parser 
explain the task has 3 jobs,but the third job(or stage) will never get 
executed).
there have two files:
1.hql explain file.
2.running logs.
you will see the stage-0 in explain file is Move Operation,but you will not see 
it in the running logs.and the fact is 16 of 17 jobs has complete(actually the 
13th job get lost?i dont see anywhere in the logs),but the 17th job get hanged 
forever and i dont see any words like "Loading data to table "in logs.

there are someone can help this?
Thanks for you very much~!

Summary: insert overwrite task awasy stuck at latest job  (was: insert 
overwrite task awasy stuck at latest Move Operator.)

> insert overwrite task awasy stuck at latest job
> ---
>
> Key: HIVE-11400
> URL: https://issues.apache.org/jira/browse/HIVE-11400
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor
>Affects Versions: 0.14.0
> Environment: hadoop 2.6.0,centos 6.5
>Reporter: Feng Yuan
> Attachments: failed_logs, success_logs, task_explain
>
>
> when i run a task like "insert overwrite table a (select *  from b join 
> select * from c  on b.id=c.id) tmp;" it will get stuck on latest job.(eg. the 
> parser explain the task has 3 jobs,but the third job(or stage) will never get 
> executed).
> there have two files:
> 1.hql explain file.
> 2.running logs.
> you will see the stage-0 in explain file is Move Operation,but you will not 
> see it in the running logs.and the fact is 16 of 17 jobs has 
> complete(actually the 13th job get lost?i dont see anywhere in the logs),but 
> the 17th job get hanged forever.
> there are someone can help this?
> Thanks for you very much~!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11400) insert overwrite task awasy stuck at latest Move Operator.

2015-07-29 Thread Feng Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated HIVE-11400:
-
Attachment: failed_logs

> insert overwrite task awasy stuck at latest Move Operator.
> --
>
> Key: HIVE-11400
> URL: https://issues.apache.org/jira/browse/HIVE-11400
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor
>Affects Versions: 0.14.0
> Environment: hadoop 2.6.0,centos 6.5
>Reporter: Feng Yuan
> Attachments: failed_logs, success_logs, task_explain
>
>
> when i run a task like "insert overwrite table a (select *  from b join 
> select * from c  on b.id=c.id) tmp;" it will get stuck on latest stage.(eg. 
> the parser explain the task has 3 jobs,but the third job(or stage) will never 
> get executed).
> there have two files:
> 1.hql explain file.
> 2.running logs.
> you will see the stage-0 in explain file is Move Operation,but you will not 
> see it in the running logs.and the fact is 16 of 17 jobs has 
> complete(actually the 13th job get lost?i dont see anywhere in the logs),but 
> the 17th job get hanged forever and i dont see any words like "Loading data 
> to table "in logs.
> there are someone can help this?
> Thanks for you very much~!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >