[jira] [Updated] (HIVE-14045) (Vectorization) Add missing case for BINARY in VectorizationContext.getNormalizedName method

2016-06-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14045:

Status: Patch Available  (was: In Progress)

> (Vectorization) Add missing case for BINARY in 
> VectorizationContext.getNormalizedName method
> 
>
> Key: HIVE-14045
> URL: https://issues.apache.org/jira/browse/HIVE-14045
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 2.2.0
>
> Attachments: HIVE-14045.01.patch, HIVE-14045.02.patch
>
>
> Missing case for BINARY data type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14045) (Vectorization) Add missing case for BINARY in VectorizationContext.getNormalizedName method

2016-06-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14045:

Attachment: HIVE-14045.02.patch

> (Vectorization) Add missing case for BINARY in 
> VectorizationContext.getNormalizedName method
> 
>
> Key: HIVE-14045
> URL: https://issues.apache.org/jira/browse/HIVE-14045
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 2.2.0
>
> Attachments: HIVE-14045.01.patch, HIVE-14045.02.patch
>
>
> Missing case for BINARY data type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits

2016-06-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341114#comment-15341114
 ] 

Hive QA commented on HIVE-14060:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12812001/HIVE-14060.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10235 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vectorization_16.q-vector_decimal_round.q-orc_merge6.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/198/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/198/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-198/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12812001 - PreCommit-HIVE-MASTER-Build

> Hive: Remove bogus "localhost" from Hive splits
> ---
>
> Key: HIVE-14060
> URL: https://issues.apache.org/jira/browse/HIVE-14060
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14060.1.patch
>
>
> On remote filesystems like Azure, GCP and S3, the splits contain a filler 
> location of "localhost".
> This is worse than having no location information at all - on large clusters 
> yarn waits upto 200[1] seconds for heartbeat from "localhost" before 
> allocating a container.
> To speed up this process, the split affinity provider should scrub the bogus 
> "localhost" from the locations and allow for the allocation of "*" containers 
> instead on each heartbeat.
> [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14045) (Vectorization) Add missing case for BINARY in VectorizationContext.getNormalizedName method

2016-06-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14045:

Status: In Progress  (was: Patch Available)

> (Vectorization) Add missing case for BINARY in 
> VectorizationContext.getNormalizedName method
> 
>
> Key: HIVE-14045
> URL: https://issues.apache.org/jira/browse/HIVE-14045
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 2.2.0
>
> Attachments: HIVE-14045.01.patch
>
>
> Missing case for BINARY data type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11527) bypass HiveServer2 thrift interface for query results

2016-06-20 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341082#comment-15341082
 ] 

Takanobu Asanuma commented on HIVE-11527:
-

BTW, somehow Jenkins did not run for HIVE-11527.10.patch. This time Jenkins 
likely to run for the new patch.

> bypass HiveServer2 thrift interface for query results
> -
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sergey Shelukhin
>Assignee: Takanobu Asanuma
> Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, 
> HIVE-11527.WIP.patch
>
>
> Right now, HS2 reads query results and returns them to the caller via its 
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS 
> link?) and for the user to read the results directly off HDFS inside the 
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11527) bypass HiveServer2 thrift interface for query results

2016-06-20 Thread Takanobu Asanuma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HIVE-11527:

Status: Open  (was: Patch Available)

> bypass HiveServer2 thrift interface for query results
> -
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sergey Shelukhin
>Assignee: Takanobu Asanuma
> Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, 
> HIVE-11527.WIP.patch
>
>
> Right now, HS2 reads query results and returns them to the caller via its 
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS 
> link?) and for the user to read the results directly off HDFS inside the 
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11527) bypass HiveServer2 thrift interface for query results

2016-06-20 Thread Takanobu Asanuma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HIVE-11527:

Status: Patch Available  (was: Open)

> bypass HiveServer2 thrift interface for query results
> -
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sergey Shelukhin
>Assignee: Takanobu Asanuma
> Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, 
> HIVE-11527.WIP.patch
>
>
> Right now, HS2 reads query results and returns them to the caller via its 
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS 
> link?) and for the user to read the results directly off HDFS inside the 
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11527) bypass HiveServer2 thrift interface for query results

2016-06-20 Thread Takanobu Asanuma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HIVE-11527:

Attachment: HIVE-11527.11.patch

[~thejas]
I uploaded a new patch in this jira and RB. And I left some comments in RB. 
Please could you check it?

> bypass HiveServer2 thrift interface for query results
> -
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sergey Shelukhin
>Assignee: Takanobu Asanuma
> Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, 
> HIVE-11527.WIP.patch
>
>
> Right now, HS2 reads query results and returns them to the caller via its 
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS 
> link?) and for the user to read the results directly off HDFS inside the 
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-06-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13872:

Attachment: HIVE-13872.03.patch

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.WIP.patch, customer_demographics.txt, 
> vector_include_no_sel.q, vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Attachment: HIVE-13982.4.patch

[~ashutoshc], later version of the patch deals with PTF operator, which could 
idd could problems if we ignore the order direction. I have updated the RB link 
accordingly.

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Attachment: (was: HIVE-13982.4.patch)

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14069) update curator version to 2.10.0

2016-06-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14069:
-
Target Version/s: 2.2.0
  Status: Patch Available  (was: Open)

> update curator version to 2.10.0 
> -
>
> Key: HIVE-14069
> URL: https://issues.apache.org/jira/browse/HIVE-14069
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14069.1.patch
>
>
> curator-2.10.0 has several bug fixes over current version (2.6.0), updating 
> would help improve stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14069) update curator version to 2.10.0

2016-06-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14069:
-
Attachment: HIVE-14069.1.patch

> update curator version to 2.10.0 
> -
>
> Key: HIVE-14069
> URL: https://issues.apache.org/jira/browse/HIVE-14069
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14069.1.patch
>
>
> curator-2.10.0 has several bug fixes over current version (2.6.0), updating 
> would help improve stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14069) update curator version to 2.10.0

2016-06-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14069:
-
Issue Type: Improvement  (was: Bug)

> update curator version to 2.10.0 
> -
>
> Key: HIVE-14069
> URL: https://issues.apache.org/jira/browse/HIVE-14069
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> curator-2.10.0 has several bug fixes over current version (2.6.0), updating 
> would help improve stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits

2016-06-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340988#comment-15340988
 ] 

Gopal V commented on HIVE-14060:


This happens to any FS which calls FileSystem.listLocatedStatus via super().

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L697

> Hive: Remove bogus "localhost" from Hive splits
> ---
>
> Key: HIVE-14060
> URL: https://issues.apache.org/jira/browse/HIVE-14060
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14060.1.patch
>
>
> On remote filesystems like Azure, GCP and S3, the splits contain a filler 
> location of "localhost".
> This is worse than having no location information at all - on large clusters 
> yarn waits upto 200[1] seconds for heartbeat from "localhost" before 
> allocating a container.
> To speed up this process, the split affinity provider should scrub the bogus 
> "localhost" from the locations and allow for the allocation of "*" containers 
> instead on each heartbeat.
> [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint

2016-06-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340986#comment-15340986
 ] 

Hive QA commented on HIVE-13725:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12812006/HIVE-13725.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10250 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/197/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/197/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-197/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12812006 - PreCommit-HIVE-MASTER-Build

> ACID: Streaming API should synchronize calls when multiple threads use the 
> same endpoint
> 
>
> Key: HIVE-13725
> URL: https://issues.apache.org/jira/browse/HIVE-13725
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Metastore, Transactions
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Critical
>  Labels: ACID, Streaming
> Attachments: HIVE-13725.1.patch, HIVE-13725.2.patch
>
>
> Currently, the streaming endpoint creates a metastore client which gets used 
> for RPC. The client itself is not internally thread safe. Therefore, the API 
> methods should provide the relevant synchronization so that the methods can 
> be called from different threads. A sample use case is as follows:
> 1. Thread 1 creates a streaming endpoint and opens a txn batch.
> 2. Thread 2 heartbeats the txn batch.
> With the current impl, this can result in an "out of sequence response", 
> since the response of the calls in thread1 might end up going to thread2 and 
> vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13985:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340893#comment-15340893
 ] 

Prasanth Jayachandran commented on HIVE-13985:
--

The last test run was initialization test failure. I ran them locally to make 
sure this patch did not break anything and it ran successfully.

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.1.0, 2.2.0
>
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13744:
-
Attachment: HIVE-13744.2.patch

Fixed the comment

> LLAP IO - add complex types support
> ---
>
> Key: HIVE-13744
> URL: https://issues.apache.org/jira/browse/HIVE-13744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>  Labels: llap, orc
> Attachments: HIVE-13744.1.patch, HIVE-13744.2.patch
>
>
> Recently, complex type column vectors were added to Hive. We should use them 
> in IO elevator.
> Vectorization itself doesn't support complex types (yet), but this would be 
> useful when it does, also it will enable LLAP IO elevator to be used in 
> non-vectorized context with complex types after HIVE-13617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566

2016-06-20 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340867#comment-15340867
 ] 

Aihua Xu commented on HIVE-14062:
-

The patch looks good. +1 pending test.

> Changes from HIVE-13502 overwritten by HIVE-13566
> -
>
> Key: HIVE-14062
> URL: https://issues.apache.org/jira/browse/HIVE-14062
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-14062.1.patch
>
>
> Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I 
> will confirm with the author that it was inadvertent before I re-add it. 
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13744) LLAP IO - add complex types support

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340865#comment-15340865
 ] 

Sergey Shelukhin commented on HIVE-13744:
-

+1, small nit on rb

> LLAP IO - add complex types support
> ---
>
> Key: HIVE-13744
> URL: https://issues.apache.org/jira/browse/HIVE-13744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>  Labels: llap, orc
> Attachments: HIVE-13744.1.patch
>
>
> Recently, complex type column vectors were added to Hive. We should use them 
> in IO elevator.
> Vectorization itself doesn't support complex types (yet), but this would be 
> useful when it does, also it will enable LLAP IO elevator to be used in 
> non-vectorized context with complex types after HIVE-13617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340861#comment-15340861
 ] 

Sergey Shelukhin commented on HIVE-13913:
-

[~gopalv] do you want to review?

> LLAP: introduce backpressure to recordreader
> 
>
> Key: HIVE-13913
> URL: https://issues.apache.org/jira/browse/HIVE-13913
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, 
> HIVE-13913.03.patch, HIVE-13913.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14065) Provide an API for making Hive read-only for a short period

2016-06-20 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340856#comment-15340856
 ] 

Mohit Sabharwal commented on HIVE-14065:


[~alangates], I think Colin is requesting an API that effectively takes a 
(shared?) lock at the metastore level, disallowing all writes
that currently each need an exclusive Zk lock.

> Provide an API for making Hive read-only for a short period
> ---
>
> Key: HIVE-14065
> URL: https://issues.apache.org/jira/browse/HIVE-14065
> Project: Hive
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> HIVE-7973 added a notification log which allows clients to do incremental 
> replication of the Hive metastore.  However, it is a challenge to get the 
> initial state of the Hive database.  Using existing APIs may give us an 
> inconsistent state.  For example, if a Hive table is renamed while we're 
> loading all tables, we may miss that information.
> The easiest way to fix this would be to provide an API for making Hive 
> read-only for a short period.  This locking API would come with a timeout so 
> that if the locker failed, the system would not stay down.  It would return 
> an ID which uniquely identified the lock instance.  The read-only lock itself 
> could be implemented by taking all the ZooKeeper locks.  The RPC for removing 
> the lock would return back a status indicating whether the lock had timed out 
> before being removed or not.  If it had timed out, we could retry our 
> snapshot loading process with a longer timeout period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14055) directSql - getting the number of partitions is broken

2016-06-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14055:

Attachment: HIVE-14055.01.patch

Nm, this actually uses filter not expr... I just c/ped the better way to handle 
it without the exception from the other filter call. If GetHelper::getSqlResult 
either throws or disables directsql for itself, the caller falls back to JDO. 
Added a comment to this effect... it looks like JDO path also exists for this 
call.

> directSql - getting the number of partitions is broken
> --
>
> Key: HIVE-14055
> URL: https://issues.apache.org/jira/browse/HIVE-14055
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14055.01.patch, HIVE-14055.patch
>
>
> Noticed while looking at something else. If the filter cannot be pushed down 
> it just returns 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14038:
--
Status: Patch Available  (was: Open)

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, 
> HIVE-14038.7.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14038:
--
Attachment: HIVE-14038.7.patch

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, 
> HIVE-14038.7.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14038:
--
Status: Open  (was: Patch Available)

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, 
> HIVE-14038.7.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340842#comment-15340842
 ] 

Hive QA commented on HIVE-13985:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12811988/HIVE-13985.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 10234 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vector_interval_2.q-dynamic_partition_pruning.q-vectorization_10.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_empty_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_partition_column_names_with_leading_and_trailing_spaces
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_part_all_primitive
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_struct_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_case
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/196/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/196/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-196/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12811988 - PreCommit-HIVE-MASTER-Build

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.1.0, 2.2.0
>
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference 

[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340834#comment-15340834
 ] 

Prasanth Jayachandran commented on HIVE-13985:
--

The test failures are not related btw. 

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.1.0, 2.2.0
>
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13985:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.0
   1.3.0
   Status: Resolved  (was: Patch Available)

Thanks [~sershe] for the reviews! Committed to branch-2.1 and master as well. 

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.1.0, 2.2.0
>
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340826#comment-15340826
 ] 

Sergey Shelukhin edited comment on HIVE-14066 at 6/21/16 1:07 AM:
--

heh... it's better to close the old one, since this one already had the 
patch... I was just being nitpicky


was (Author: sershe):
heh... it was better to close the old one, since this one already had the 
patch... I was just being nitpicky

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14055) directSql - getting the number of partitions is broken

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340824#comment-15340824
 ] 

Sergey Shelukhin commented on HIVE-14055:
-

null means "filter cannot be pushed down", which is a normal condition, many 
filters cannot be pushed down; some of the methods (e.g. get partitions) 
evaluate it in metastore instead, using partition name list, some give up and 
fall back to ORM path. Perhaps I can c/p the evaluation path.

> directSql - getting the number of partitions is broken
> --
>
> Key: HIVE-14055
> URL: https://issues.apache.org/jira/browse/HIVE-14055
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14055.patch
>
>
> Noticed while looking at something else. If the filter cannot be pushed down 
> it just returns 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340826#comment-15340826
 ] 

Sergey Shelukhin commented on HIVE-14066:
-

heh... it was better to close the old one, since this one already had the 
patch... I was just being nitpicky

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340825#comment-15340825
 ] 

Jesus Camacho Rodriguez commented on HIVE-13982:


New patch uploaded; I want to get a QA run. I still need to check whether PTF 
would cause troubles with the new dedup extension. I will update the JIRA case 
shortly.

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Status: Patch Available  (was: In Progress)

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13982 started by Jesus Camacho Rodriguez.
--
> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Status: Open  (was: Patch Available)

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Attachment: HIVE-13982.4.patch

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340815#comment-15340815
 ] 

Sergey Shelukhin commented on HIVE-14060:
-

Does it happen on FSes other than Azure? The culprit there seems to be 
AZURE_BLOCK_LOCATION_HOST_DEFAULT in the FS. It may be azure-specific...

> Hive: Remove bogus "localhost" from Hive splits
> ---
>
> Key: HIVE-14060
> URL: https://issues.apache.org/jira/browse/HIVE-14060
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14060.1.patch
>
>
> On remote filesystems like Azure, GCP and S3, the splits contain a filler 
> location of "localhost".
> This is worse than having no location information at all - on large clusters 
> yarn waits upto 200[1] seconds for heartbeat from "localhost" before 
> allocating a container.
> To speed up this process, the split affinity provider should scrub the bogus 
> "localhost" from the locations and allow for the allocation of "*" containers 
> instead on each heartbeat.
> [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340815#comment-15340815
 ] 

Sergey Shelukhin edited comment on HIVE-14060 at 6/21/16 1:02 AM:
--

Does it happen on FSes other than Azure? The culprit there seems to be 
AZURE_BLOCK_LOCATION_HOST_DEFAULT in the FS. It may be azure-specific... maybe 
it should be fixed in HDFS too


was (Author: sershe):
Does it happen on FSes other than Azure? The culprit there seems to be 
AZURE_BLOCK_LOCATION_HOST_DEFAULT in the FS. It may be azure-specific...

> Hive: Remove bogus "localhost" from Hive splits
> ---
>
> Key: HIVE-14060
> URL: https://issues.apache.org/jira/browse/HIVE-14060
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14060.1.patch
>
>
> On remote filesystems like Azure, GCP and S3, the splits contain a filler 
> location of "localhost".
> This is worse than having no location information at all - on large clusters 
> yarn waits upto 200[1] seconds for heartbeat from "localhost" before 
> allocating a container.
> To speed up this process, the split affinity provider should scrub the bogus 
> "localhost" from the locations and allow for the allocation of "*" containers 
> instead on each heartbeat.
> [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14038:
--
Attachment: HIVE-14038.6.patch

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14038:
--
Attachment: (was: HIVE-14038.branch-1.2.patch)

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14066:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Duplicate of HIVE-13744.

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13744:
-
Affects Version/s: 2.2.0

> LLAP IO - add complex types support
> ---
>
> Key: HIVE-13744
> URL: https://issues.apache.org/jira/browse/HIVE-13744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>  Labels: llap, orc
> Attachments: HIVE-13744.1.patch
>
>
> Recently, complex type column vectors were added to Hive. We should use them 
> in IO elevator.
> Vectorization itself doesn't support complex types (yet), but this would be 
> useful when it does, also it will enable LLAP IO elevator to be used in 
> non-vectorized context with complex types after HIVE-13617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13744:
-
Target Version/s: 2.2.0
  Status: Patch Available  (was: Open)

> LLAP IO - add complex types support
> ---
>
> Key: HIVE-13744
> URL: https://issues.apache.org/jira/browse/HIVE-13744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>  Labels: llap, orc
> Attachments: HIVE-13744.1.patch
>
>
> Recently, complex type column vectors were added to Hive. We should use them 
> in IO elevator.
> Vectorization itself doesn't support complex types (yet), but this would be 
> useful when it does, also it will enable LLAP IO elevator to be used in 
> non-vectorized context with complex types after HIVE-13617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13744:
-
Attachment: HIVE-13744.1.patch

> LLAP IO - add complex types support
> ---
>
> Key: HIVE-13744
> URL: https://issues.apache.org/jira/browse/HIVE-13744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>  Labels: llap, orc
> Attachments: HIVE-13744.1.patch
>
>
> Recently, complex type column vectors were added to Hive. We should use them 
> in IO elevator.
> Vectorization itself doesn't support complex types (yet), but this would be 
> useful when it does, also it will enable LLAP IO elevator to be used in 
> non-vectorized context with complex types after HIVE-13617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13744:
-
Labels: llap orc  (was: )

> LLAP IO - add complex types support
> ---
>
> Key: HIVE-13744
> URL: https://issues.apache.org/jira/browse/HIVE-13744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>  Labels: llap, orc
> Attachments: HIVE-13744.1.patch
>
>
> Recently, complex type column vectors were added to Hive. We should use them 
> in IO elevator.
> Vectorization itself doesn't support complex types (yet), but this would be 
> useful when it does, also it will enable LLAP IO elevator to be used in 
> non-vectorized context with complex types after HIVE-13617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340799#comment-15340799
 ] 

Prasanth Jayachandran commented on HIVE-14066:
--

Sorry for the oversight :) I will close this one.. Will use the old one instead.

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14066:

Reporter: Sergey Shelukhin  (was: Prasanth Jayachandran)

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340795#comment-15340795
 ] 

Sergey Shelukhin commented on HIVE-14066:
-

Dup of HIVE-13744 :P I'll take a look eventually

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340793#comment-15340793
 ] 

Sergey Shelukhin commented on HIVE-13985:
-

+1

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13590) Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case

2016-06-20 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-13590:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Patch has been committed to master (for 2.2.0) and branch-2.1 (for 2.1.1). 
Thank [~szehon] and [~spena] for review.

> Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
> -
>
> Key: HIVE-13590
> URL: https://issues.apache.org/jira/browse/HIVE-13590
> Project: Hive
>  Issue Type: Bug
>  Components: Authentication, Security
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-13590.1.patch, HIVE-13590.1.patch, 
> HIVE-13590.patch, HIVE-13590.patch
>
>
> In a kerberized HS2 with LDAP authentication enabled, LDAP user usually logs 
> in using username in form of username@domain in LDAP multi-domain case. But 
> it fails if the domain was not in the Hadoop auth_to_local mapping rule, the 
> error is as following:
> {code}
> Caused by: 
> org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: 
> No rules applied to ct...@mydomain.com
> at 
> org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389)
> at org.apache.hadoop.security.User.(User.java:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14057) Add an option in llapstatus to generate output to a file

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340791#comment-15340791
 ] 

Sergey Shelukhin commented on HIVE-14057:
-

if it's invoked remotely, then how would one use the file (which is remote)? 
Redirecting the remote command on the remote side will create the file there 
too, w/o motd/etc. Normally such tools would output the results into stdout and 
other crap into stderr.

Also some comments on RB

> Add an option in llapstatus to generate output to a file
> 
>
> Key: HIVE-14057
> URL: https://issues.apache.org/jira/browse/HIVE-14057
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14057.01.patch, HIVE-14057.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14023) LLAP: Make the Hive query id available in ContainerRunner

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340778#comment-15340778
 ] 

Sergey Shelukhin commented on HIVE-14023:
-

+1

> LLAP: Make the Hive query id available in ContainerRunner
> -
>
> Key: HIVE-14023
> URL: https://issues.apache.org/jira/browse/HIVE-14023
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14023.01.patch, HIVE-14023.02.patch
>
>
> Needed to generate logs per query.
> We can use the dag identifier for now, but that isn't very useful. (The 
> queryId may not be too useful either if users cannot find it - but that's 
> better than a dagIdentifier)
> The queryId is available right now after the Processor starts, which is too 
> late for log changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14068) make more effort to find hive-site.xml

2016-06-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14068:

Status: Patch Available  (was: Open)

> make more effort to find hive-site.xml
> --
>
> Key: HIVE-14068
> URL: https://issues.apache.org/jira/browse/HIVE-14068
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14068.patch
>
>
> It pretty much doesn't make sense to run Hive w/o the config, so we should 
> make more effort to find one if it's missing on the classpath, or the 
> classloader does not return it for some reason (e.g. classloader ignores some 
> permission issues; explicitly looking for the file may expose them better)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14068) make more effort to find hive-site.xml

2016-06-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14068:

Attachment: HIVE-14068.patch

Small patch; no behavior changes as long as the config is in the classpath.
[~ashutoshc] can you take a look?

> make more effort to find hive-site.xml
> --
>
> Key: HIVE-14068
> URL: https://issues.apache.org/jira/browse/HIVE-14068
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14068.patch
>
>
> It pretty much doesn't make sense to run Hive w/o the config, so we should 
> make more effort to find one if it's missing on the classpath, or the 
> classloader does not return it for some reason (e.g. classloader ignores some 
> permission issues; explicitly looking for the file may expose them better)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340768#comment-15340768
 ] 

Prasanth Jayachandran commented on HIVE-13985:
--

[~sershe] Can you take another look plz?

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-06-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340743#comment-15340743
 ] 

Ashutosh Chauhan commented on HIVE-13901:
-

Some of these tests are still failing to me when ran locally:
{code}
Running org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore
Tests run: 34, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 91.734 sec <<< 
FAILURE! - in org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore
testPartition(org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore)  Time 
elapsed: 1.749 sec  <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertTrue(Assert.java:31)
at junit.framework.TestCase.assertTrue(TestCase.java:201)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.partitionTester(TestHiveMetaStore.java:443)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.testPartition(TestHiveMetaStore.java:146)
{code}
{code}
Running org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore
Tests run: 34, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 91.111 sec <<< 
FAILURE! - in org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore
testTransactionalValidation(org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore)
  Time elapsed: 0.143 sec  <<< ERROR!
org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable 
already exists
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:41480)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:41466)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:41392)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1183)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1169)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2325)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:738)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:726)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.createTable(TestHiveMetaStore.java:2967)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.testTransactionalValidation(TestHiveMetaStore.java:2897)
{code}

{code}
testPartition(org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore)  Time 
elapsed: 1.675 sec  <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertTrue(Assert.java:31)
at junit.framework.TestCase.assertTrue(TestCase.java:201)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.partitionTester(TestHiveMetaStore.java:443)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.testPartition(TestHiveMetaStore.java:146)
{code}
{code}
testPartition(org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer)  Time 
elapsed: 1.771 sec  <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertTrue(Assert.java:31)
at junit.framework.TestCase.assertTrue(TestCase.java:201)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.partitionTester(TestHiveMetaStore.java:443)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.testPartition(TestHiveMetaStore.java:146)
{code}

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, 
> HIVE-13901.6.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck 

[jira] [Commented] (HIVE-13960) Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.

2016-06-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340739#comment-15340739
 ] 

zhihai xu commented on HIVE-13960:
--

Yes, renaming pendingCount to activeCalls sounds good to me. Will fix it in a 
follow up JIRA HIVE-14067. thanks for the review [~thejas]!

> Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for 
> back-to-back synchronous operations.
> 
>
> Key: HIVE-13960
> URL: https://issues.apache.org/jira/browse/HIVE-13960
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.2.0
>
> Attachments: HIVE-13960.000.patch
>
>
> Session timeout may happen before 
> HIVE_SERVER2_IDLE_SESSION_TIMEOUT(hive.server2.idle.session.timeout) for 
> back-to-back synchronous operations.
> This issue can happen with the following two operations op1 and op2: op2 is a 
> synchronous long running operation, op2 is running right after op1 is closed.
>  
> 1. closeOperation(op1) is called:
> this will set {{lastIdleTime}} with value System.currentTimeMillis() because 
> {{opHandleSet}} becomes empty after {{closeOperation}} remove op1 from 
> {{opHandleSet}}.
> 2. op2 is running for long time by calling {{executeStatement}} right after 
> closeOperation(op1) is called.
> If op2 is running for more than HIVE_SERVER2_IDLE_SESSION_TIMEOUT, then the 
> session will timeout even when op2 is still running.
> We hit this issue when we use PyHive to execute non-async operation 
> The following is the exception we see:
> {code}
> File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 126, in 
> close
> _check_status(response)
>   File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 362, in 
> _check_status
> raise OperationalError(response)
> OperationalError: TCloseSessionResp(status=TStatus(errorCode=0, 
> errorMessage='Session does not exist!', sqlState=None, 
> infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not 
> exist!:12:11', 
> 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311',
>  'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', 
> 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258',
>  'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 
> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 
> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
>  
> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
>  
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
>  
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
>  'java.lang.Thread:run:Thread.java:745'], statusCode=3))
> TCloseSessionResp(status=TStatus(errorCode=0, errorMessage='Session does not 
> exist!', sqlState=None, 
> infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not 
> exist!:12:11', 
> 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311',
>  'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', 
> 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258',
>  'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 
> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 
> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
>  
> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
>  
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
>  
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
>  'java.lang.Thread:run:Thread.java:745'], statusCode=3))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14065) Provide an API for making Hive read-only for a short period

2016-06-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340696#comment-15340696
 ] 

Alan Gates commented on HIVE-14065:
---

I'm unclear what you mean by "taking all the ZooKeeper locks".  Can you 
elaborate?

> Provide an API for making Hive read-only for a short period
> ---
>
> Key: HIVE-14065
> URL: https://issues.apache.org/jira/browse/HIVE-14065
> Project: Hive
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> HIVE-7973 added a notification log which allows clients to do incremental 
> replication of the Hive metastore.  However, it is a challenge to get the 
> initial state of the Hive database.  Using existing APIs may give us an 
> inconsistent state.  For example, if a Hive table is renamed while we're 
> loading all tables, we may miss that information.
> The easiest way to fix this would be to provide an API for making Hive 
> read-only for a short period.  This locking API would come with a timeout so 
> that if the locker failed, the system would not stay down.  It would return 
> an ID which uniquely identified the lock instance.  The read-only lock itself 
> could be implemented by taking all the ZooKeeper locks.  The RPC for removing 
> the lock would return back a status indicating whether the lock had timed out 
> before being removed or not.  If it had timed out, we could retry our 
> snapshot loading process with a longer timeout period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14066:
-
Status: Patch Available  (was: Open)

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14066:
-
Attachment: HIVE-14066.1.patch

NOTE: Vectorized operator pipeline does not support complex types yet. Although 
reader supports reading out complex vectors the Vectorizer will fail to 
vectorize. We use the same test cases from vectorization in MiniLlap, when 
vectorization supports complex types the encoded data reader should work fine.

> LLAP: Orc encoded data reader should support complex types
> --
>
> Key: HIVE-14066
> URL: https://issues.apache.org/jira/browse/HIVE-14066
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14066.1.patch
>
>
> Currently LLAP encoded data reader does not support complex types. Now that 
> ORC supports reading complex vectors we should support in LLAP as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14057) Add an option in llapstatus to generate output to a file

2016-06-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340652#comment-15340652
 ] 

Hive QA commented on HIVE-14057:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12811969/HIVE-14057.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10237 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/194/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/194/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-194/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12811969 - PreCommit-HIVE-MASTER-Build

> Add an option in llapstatus to generate output to a file
> 
>
> Key: HIVE-14057
> URL: https://issues.apache.org/jira/browse/HIVE-14057
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14057.01.patch, HIVE-14057.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566

2016-06-20 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340637#comment-15340637
 ] 

Naveen Gangam commented on HIVE-14062:
--

[~aihuaxu] Could you please re-review and re-commit please? Thank you

> Changes from HIVE-13502 overwritten by HIVE-13566
> -
>
> Key: HIVE-14062
> URL: https://issues.apache.org/jira/browse/HIVE-14062
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-14062.1.patch
>
>
> Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I 
> will confirm with the author that it was inadvertent before I re-add it. 
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566

2016-06-20 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14062:
-
Attachment: HIVE-14062.1.patch

The fix and test is exactly same as HIVE-13502. 

> Changes from HIVE-13502 overwritten by HIVE-13566
> -
>
> Key: HIVE-14062
> URL: https://issues.apache.org/jira/browse/HIVE-14062
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-14062.1.patch
>
>
> Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I 
> will confirm with the author that it was inadvertent before I re-add it. 
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566

2016-06-20 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14062:
-
Status: Patch Available  (was: Open)

> Changes from HIVE-13502 overwritten by HIVE-13566
> -
>
> Key: HIVE-14062
> URL: https://issues.apache.org/jira/browse/HIVE-14062
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-14062.1.patch
>
>
> Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I 
> will confirm with the author that it was inadvertent before I re-add it. 
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340620#comment-15340620
 ] 

Sergio Peña commented on HIVE-13930:


[~sershe] I don't know why you're getting those dependencies issues. The Spark 
file that is downloaded is written on {{itests/thirdparty}}, but I do not see 
any JARs there.

Please don't remove the {{SparkCliDriver}} yet. That is an important test that 
validates hive on spark.

[~xuefuz] Do you have any idea about why this is failing?

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13960) Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.

2016-06-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340617#comment-15340617
 ] 

Thejas M Nair commented on HIVE-13960:
--

Thanks for the patch [~zxu] and [~jxiang]
>From the variable name pendingCount, its hard to understand what it 
>represents. Should we name it activeCalls instead ?
If you agree, the change can be done in a follow up jira.


> Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for 
> back-to-back synchronous operations.
> 
>
> Key: HIVE-13960
> URL: https://issues.apache.org/jira/browse/HIVE-13960
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.2.0
>
> Attachments: HIVE-13960.000.patch
>
>
> Session timeout may happen before 
> HIVE_SERVER2_IDLE_SESSION_TIMEOUT(hive.server2.idle.session.timeout) for 
> back-to-back synchronous operations.
> This issue can happen with the following two operations op1 and op2: op2 is a 
> synchronous long running operation, op2 is running right after op1 is closed.
>  
> 1. closeOperation(op1) is called:
> this will set {{lastIdleTime}} with value System.currentTimeMillis() because 
> {{opHandleSet}} becomes empty after {{closeOperation}} remove op1 from 
> {{opHandleSet}}.
> 2. op2 is running for long time by calling {{executeStatement}} right after 
> closeOperation(op1) is called.
> If op2 is running for more than HIVE_SERVER2_IDLE_SESSION_TIMEOUT, then the 
> session will timeout even when op2 is still running.
> We hit this issue when we use PyHive to execute non-async operation 
> The following is the exception we see:
> {code}
> File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 126, in 
> close
> _check_status(response)
>   File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 362, in 
> _check_status
> raise OperationalError(response)
> OperationalError: TCloseSessionResp(status=TStatus(errorCode=0, 
> errorMessage='Session does not exist!', sqlState=None, 
> infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not 
> exist!:12:11', 
> 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311',
>  'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', 
> 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258',
>  'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 
> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 
> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
>  
> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
>  
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
>  
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
>  'java.lang.Thread:run:Thread.java:745'], statusCode=3))
> TCloseSessionResp(status=TStatus(errorCode=0, errorMessage='Session does not 
> exist!', sqlState=None, 
> infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not 
> exist!:12:11', 
> 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311',
>  'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', 
> 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258',
>  'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 
> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 
> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
>  
> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
>  
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
>  
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
>  'java.lang.Thread:run:Thread.java:745'], statusCode=3))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13960) Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.

2016-06-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340617#comment-15340617
 ] 

Thejas M Nair edited comment on HIVE-13960 at 6/20/16 11:00 PM:


Thanks for the patch [~zxu] and [~jxiang]
>From the variable name pendingCount, its hard to understand what it 
>represents. Should we name it activeCalls (or something on that lines) instead 
>?
If you agree, the change can be done in a follow up jira.



was (Author: thejas):
Thanks for the patch [~zxu] and [~jxiang]
>From the variable name pendingCount, its hard to understand what it 
>represents. Should we name it activeCalls instead ?
If you agree, the change can be done in a follow up jira.


> Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for 
> back-to-back synchronous operations.
> 
>
> Key: HIVE-13960
> URL: https://issues.apache.org/jira/browse/HIVE-13960
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.2.0
>
> Attachments: HIVE-13960.000.patch
>
>
> Session timeout may happen before 
> HIVE_SERVER2_IDLE_SESSION_TIMEOUT(hive.server2.idle.session.timeout) for 
> back-to-back synchronous operations.
> This issue can happen with the following two operations op1 and op2: op2 is a 
> synchronous long running operation, op2 is running right after op1 is closed.
>  
> 1. closeOperation(op1) is called:
> this will set {{lastIdleTime}} with value System.currentTimeMillis() because 
> {{opHandleSet}} becomes empty after {{closeOperation}} remove op1 from 
> {{opHandleSet}}.
> 2. op2 is running for long time by calling {{executeStatement}} right after 
> closeOperation(op1) is called.
> If op2 is running for more than HIVE_SERVER2_IDLE_SESSION_TIMEOUT, then the 
> session will timeout even when op2 is still running.
> We hit this issue when we use PyHive to execute non-async operation 
> The following is the exception we see:
> {code}
> File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 126, in 
> close
> _check_status(response)
>   File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 362, in 
> _check_status
> raise OperationalError(response)
> OperationalError: TCloseSessionResp(status=TStatus(errorCode=0, 
> errorMessage='Session does not exist!', sqlState=None, 
> infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not 
> exist!:12:11', 
> 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311',
>  'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', 
> 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258',
>  'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 
> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 
> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
>  
> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
>  
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
>  
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
>  'java.lang.Thread:run:Thread.java:745'], statusCode=3))
> TCloseSessionResp(status=TStatus(errorCode=0, errorMessage='Session does not 
> exist!', sqlState=None, 
> infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not 
> exist!:12:11', 
> 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311',
>  'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', 
> 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273',
>  
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258',
>  'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 
> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 
> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
>  
> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
>  
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
>  
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
> 

[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340603#comment-15340603
 ] 

Sergey Shelukhin commented on HIVE-13884:
-

It would fall back to ORM in this case. Assuming there was ORM implementation 
in the original patch

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340594#comment-15340594
 ] 

Sergio Peña commented on HIVE-13884:


If {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns an error or 
throws an exception whenever the internal 
{{PartitionFilterGenerator.generateSqlFilter}} fails, then how should we handle 
the partition limit request? There is no data to validate this, and we cannot 
abort the query because of this.

[~sershe] [~mohitsabharwal] Any ideas on this? Should we fix the 
{[generateSqlFilter}} to avoid returning NULL when the filter cannot be formed?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14041) llap scripts add hadoop and other libraries from the machine local install to the daemon classpath

2016-06-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340584#comment-15340584
 ] 

Siddharth Seth commented on HIVE-14041:
---

Ok. Looked into this some more. `hadoop classpath` does not provide the 
LD_LIBRARY_PATH. This is setup separately via template.py, and required 
hadoop_home to be set. (Ends up being "None/lib/native").
Somehow on the cluster I was using to test this - the LD_LIBRARY_PATH was setup 
correctly before invoking runLlapDaemon.sh. A yarn NM export maybe ?

IAC - the native part seems unrelated to this jira, and can be investigated in 
a follow up.
[~gopalv] - please review.

> llap scripts add hadoop and other libraries from the machine local install to 
> the daemon classpath
> --
>
> Key: HIVE-14041
> URL: https://issues.apache.org/jira/browse/HIVE-14041
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14041.01.patch
>
>
> `hadoop classpath` ends up getting added to the classpath of llap daemons. 
> This essentially means picking up the classpath from the local deploy.
> This isn't required since the slider package includes relevant libraries 
> (shipped from the client)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340585#comment-15340585
 ] 

Sergio Peña commented on HIVE-13884:


Sometimes, the {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns 0 
when the query filter expression couldn't be created. This number makes a false 
positive to the limit request if the number of partitions is too large, so 
causing the query to fetch all partitions.

HIVE-14055 is required for this patch.

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> 
>
> Key: HIVE-13884
> URL: https://issues.apache.org/jira/browse/HIVE-13884
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohit Sabharwal
>Assignee: Sergio Peña
> Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size

2016-06-20 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13809:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.0
   Status: Resolved  (was: Patch Available)

Thanks [~gopalv] for the review. Committed to master and branch-2.1.

> Hybrid Grace Hash Join memory usage estimation didn't take into account the 
> bloom filter size
> -
>
> Key: HIVE-13809
> URL: https://issues.apache.org/jira/browse/HIVE-13809
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.1.0, 2.2.0
>
> Attachments: HIVE-13809.1.patch
>
>
> Memory estimation is important during hash table loading, because we need to 
> make the decision of whether to load the next hash partition in memory or 
> spill it. If the assumption is there's enough memory but it turns out not the 
> case, we will run into OOM problem.
> Currently hybrid grace hash join memory usage estimation didn't take into 
> account the bloom filter size. In large test cases (TB scale) the bloom 
> filter grows as big as hundreds of MB, big enough to cause estimation error.
> The solution is to count in the bloom filter size into memory estimation.
> Another issue this patch will fix is possible NPE due to object cache reuse 
> during hybrid grace hash join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-14064) beeline to auto connect to the HiveServer2

2016-06-20 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reopened HIVE-14064:


> beeline to auto connect to the HiveServer2
> --
>
> Key: HIVE-14064
> URL: https://issues.apache.org/jira/browse/HIVE-14064
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> Currently one has to give an jdbc:hive2 url in order for Beeline to connect a 
> hiveserver2 instance. It would be great if Beeline can get the info somehow 
> (from a properties file at a well-known location?) and connect automatically 
> if user doesn't specify such a url. If the properties file is not present, 
> then beeline would expect user to provide the url and credentials using 
> !connect or ./beeline -u .. commands
> While Beeline is flexible (being a mere JDBC client), most environments would 
> have just a single HS2. Having users to manually connect into this via either 
> "beeline ~/.propsfile" or -u or !connect statements is lowering the 
> experience part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14064) beeline to auto connect to the HiveServer2

2016-06-20 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-14064.

Resolution: Duplicate

> beeline to auto connect to the HiveServer2
> --
>
> Key: HIVE-14064
> URL: https://issues.apache.org/jira/browse/HIVE-14064
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> Currently one has to give an jdbc:hive2 url in order for Beeline to connect a 
> hiveserver2 instance. It would be great if Beeline can get the info somehow 
> (from a properties file at a well-known location?) and connect automatically 
> if user doesn't specify such a url. If the properties file is not present, 
> then beeline would expect user to provide the url and credentials using 
> !connect or ./beeline -u .. commands
> While Beeline is flexible (being a mere JDBC client), most environments would 
> have just a single HS2. Having users to manually connect into this via either 
> "beeline ~/.propsfile" or -u or !connect statements is lowering the 
> experience part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14038:
--
Attachment: HIVE-14038.branch-1.2.patch

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.branch-1.2.patch, 
> HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566

2016-06-20 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340543#comment-15340543
 ] 

Naveen Gangam commented on HIVE-14062:
--

[~pxiong] No problem at all. Its all good. Thanks for the quick confirmation. 

> Changes from HIVE-13502 overwritten by HIVE-13566
> -
>
> Key: HIVE-14062
> URL: https://issues.apache.org/jira/browse/HIVE-14062
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>
> Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I 
> will confirm with the author that it was inadvertent before I re-add it. 
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566

2016-06-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340529#comment-15340529
 ] 

Pengcheng Xiong commented on HIVE-14062:


It was lost during rebase and they were not intentional at all. I am sorry 
about that and please add them back. Thanks.

> Changes from HIVE-13502 overwritten by HIVE-13566
> -
>
> Key: HIVE-14062
> URL: https://issues.apache.org/jira/browse/HIVE-14062
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>
> Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I 
> will confirm with the author that it was inadvertent before I re-add it. 
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566

2016-06-20 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340512#comment-15340512
 ] 

Naveen Gangam commented on HIVE-14062:
--

[~pxiong] Could you please take a look at the change in HIVE-13566 and let me 
know if the overwritten changes were inadvertent that may have been lost during 
rebase or were they intentional? Thanks

> Changes from HIVE-13502 overwritten by HIVE-13566
> -
>
> Key: HIVE-14062
> URL: https://issues.apache.org/jira/browse/HIVE-14062
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>
> Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I 
> will confirm with the author that it was inadvertent before I re-add it. 
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size

2016-06-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340503#comment-15340503
 ] 

Gopal V commented on HIVE-13809:


LGTM - +1.

The bloom filter sizing needs a revisit, since this is pre-allocated based on 
estimates, not on real row-counts - allowing more false positives at higher 
cardinalities, to keep the memory utilization under check.

> Hybrid Grace Hash Join memory usage estimation didn't take into account the 
> bloom filter size
> -
>
> Key: HIVE-13809
> URL: https://issues.apache.org/jira/browse/HIVE-13809
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13809.1.patch
>
>
> Memory estimation is important during hash table loading, because we need to 
> make the decision of whether to load the next hash partition in memory or 
> spill it. If the assumption is there's enough memory but it turns out not the 
> case, we will run into OOM problem.
> Currently hybrid grace hash join memory usage estimation didn't take into 
> account the bloom filter size. In large test cases (TB scale) the bloom 
> filter grows as big as hundreds of MB, big enough to cause estimation error.
> The solution is to count in the bloom filter size into memory estimation.
> Another issue this patch will fix is possible NPE due to object cache reuse 
> during hybrid grace hash join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits

2016-06-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14060:
---
Status: Patch Available  (was: Open)

> Hive: Remove bogus "localhost" from Hive splits
> ---
>
> Key: HIVE-14060
> URL: https://issues.apache.org/jira/browse/HIVE-14060
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14060.1.patch
>
>
> On remote filesystems like Azure, GCP and S3, the splits contain a filler 
> location of "localhost".
> This is worse than having no location information at all - on large clusters 
> yarn waits upto 200[1] seconds for heartbeat from "localhost" before 
> allocating a container.
> To speed up this process, the split affinity provider should scrub the bogus 
> "localhost" from the locations and allow for the allocation of "*" containers 
> instead on each heartbeat.
> [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits

2016-06-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14060:
---
Attachment: HIVE-14060.1.patch

> Hive: Remove bogus "localhost" from Hive splits
> ---
>
> Key: HIVE-14060
> URL: https://issues.apache.org/jira/browse/HIVE-14060
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14060.1.patch
>
>
> On remote filesystems like Azure, GCP and S3, the splits contain a filler 
> location of "localhost".
> This is worse than having no location information at all - on large clusters 
> yarn waits upto 200[1] seconds for heartbeat from "localhost" before 
> allocating a container.
> To speed up this process, the split affinity provider should scrub the bogus 
> "localhost" from the locations and allow for the allocation of "*" containers 
> instead on each heartbeat.
> [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13566) Auto-gather column stats - phase 1

2016-06-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340489#comment-15340489
 ] 

Pengcheng Xiong commented on HIVE-13566:


[~vihangk1], sorry about that, could u add the fix back? thanks.

> Auto-gather column stats - phase 1
> --
>
> Key: HIVE-13566
> URL: https://issues.apache.org/jira/browse/HIVE-13566
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13566.01.patch, HIVE-13566.02.patch, 
> HIVE-13566.03.patch
>
>
> This jira adds code and tests for auto-gather column stats. Golden file 
> update will be done in phase 2 - HIVE-11160



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13566) Auto-gather column stats - phase 1

2016-06-20 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340476#comment-15340476
 ] 

Vihang Karajgaonkar commented on HIVE-13566:


Looks like the commit for this Jira removed the fix for HIVE-13502 too.

> Auto-gather column stats - phase 1
> --
>
> Key: HIVE-13566
> URL: https://issues.apache.org/jira/browse/HIVE-13566
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13566.01.patch, HIVE-13566.02.patch, 
> HIVE-13566.03.patch
>
>
> This jira adds code and tests for auto-gather column stats. Golden file 
> update will be done in phase 2 - HIVE-11160



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340475#comment-15340475
 ] 

Sergey Shelukhin commented on HIVE-13901:
-

+1

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, 
> HIVE-13901.6.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-06-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340468#comment-15340468
 ] 

Rajesh Balamohan commented on HIVE-13901:
-

This is due to HIVE-14054. With HIVE-14054, all these tests pass in my machine.

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, 
> HIVE-13901.6.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize

2016-06-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14012:

   Resolution: Fixed
Fix Version/s: 2.0.2
   2.1.1
   2.2.0
   1.3.0
   Status: Resolved  (was: Patch Available)

Committed to all the affected branches...

> some ColumnVector-s are missing ensureSize
> --
>
> Key: HIVE-14012
> URL: https://issues.apache.org/jira/browse/HIVE-14012
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HIVE-14012.01.patch, HIVE-14012.01.patch, 
> HIVE-14012.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14024) setAllColumns is called incorrectly after some changes

2016-06-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14024:

   Resolution: Fixed
Fix Version/s: 2.0.2
   2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed

> setAllColumns is called incorrectly after some changes
> --
>
> Key: HIVE-14024
> URL: https://issues.apache.org/jira/browse/HIVE-14024
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HIVE-14024.01.patch, HIVE-14024.patch
>
>
> h/t [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14012) some ColumnVector-s are missing ensureSize

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340426#comment-15340426
 ] 

Sergey Shelukhin commented on HIVE-14012:
-

All the failures are known

> some ColumnVector-s are missing ensureSize
> --
>
> Key: HIVE-14012
> URL: https://issues.apache.org/jira/browse/HIVE-14012
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14012.01.patch, HIVE-14012.01.patch, 
> HIVE-14012.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340422#comment-15340422
 ] 

Eugene Koifman commented on HIVE-14038:
---

[~wzheng] could you review please

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340418#comment-15340418
 ] 

Hive QA commented on HIVE-14038:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12811851/HIVE-14038.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-tez_union_group_by.q-vector_auto_smb_mapjoin_14.q-union_fast_stats.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/193/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/193/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-193/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12811851 - PreCommit-HIVE-MASTER-Build

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13490) Change itests to be part of the main Hive build

2016-06-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340368#comment-15340368
 ] 

Zoltan Haindrich commented on HIVE-13490:
-

This whole sentence was confusing
I just wanted to point out the positive side of this: that it will compile the 
integration tests too when it's enabled; and that might come handy if someone 
is working on api changesafter a few minutes i ended up removing it because 
this thing doesn't really align with the section's topic.


> Change itests to be part of the main Hive build
> ---
>
> Key: HIVE-13490
> URL: https://issues.apache.org/jira/browse/HIVE-13490
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-13490.01.patch, HIVE-13490.02.patch, 
> HIVE-13490.03.patch
>
>
> Instead of having to build Hive, and then itests separately.
> With IntelliJ, this ends up being loaded as two separate dependencies, and 
> there's a lot of hops involved to make changes.
> Does anyone know why these have been kept separate ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13590) Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case

2016-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340353#comment-15340353
 ] 

Sergio Peña commented on HIVE-13590:


Thanks [~ctang.ma]
+1

> Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
> -
>
> Key: HIVE-13590
> URL: https://issues.apache.org/jira/browse/HIVE-13590
> Project: Hive
>  Issue Type: Bug
>  Components: Authentication, Security
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13590.1.patch, HIVE-13590.1.patch, 
> HIVE-13590.patch, HIVE-13590.patch
>
>
> In a kerberized HS2 with LDAP authentication enabled, LDAP user usually logs 
> in using username in form of username@domain in LDAP multi-domain case. But 
> it fails if the domain was not in the Hadoop auth_to_local mapping rule, the 
> error is as following:
> {code}
> Caused by: 
> org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: 
> No rules applied to ct...@mydomain.com
> at 
> org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389)
> at org.apache.hadoop.security.User.(User.java:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13985:
-
Attachment: HIVE-13985.6.patch

Addressed [~sershe]'s review comments for master patch.

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, 
> HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, 
> HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14041) llap scripts add hadoop and other libraries from the machine local install to the daemon classpath

2016-06-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14041:
--
Status: Open  (was: Patch Available)

> llap scripts add hadoop and other libraries from the machine local install to 
> the daemon classpath
> --
>
> Key: HIVE-14041
> URL: https://issues.apache.org/jira/browse/HIVE-14041
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14041.01.patch
>
>
> `hadoop classpath` ends up getting added to the classpath of llap daemons. 
> This essentially means picking up the classpath from the local deploy.
> This isn't required since the slider package includes relevant libraries 
> (shipped from the client)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14041) llap scripts add hadoop and other libraries from the machine local install to the daemon classpath

2016-06-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340347#comment-15340347
 ] 

Siddharth Seth commented on HIVE-14041:
---

>From talking with [~gopalv] - hadoop classpath was making native libs 
>available. Will make some changes to the patch for the same.

> llap scripts add hadoop and other libraries from the machine local install to 
> the daemon classpath
> --
>
> Key: HIVE-14041
> URL: https://issues.apache.org/jira/browse/HIVE-14041
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14041.01.patch
>
>
> `hadoop classpath` ends up getting added to the classpath of llap daemons. 
> This essentially means picking up the classpath from the local deploy.
> This isn't required since the slider package includes relevant libraries 
> (shipped from the client)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14038) miscellaneous acid improvements

2016-06-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14038:
--
Attachment: HIVE-14038.4.patch

> miscellaneous acid improvements
> ---
>
> Key: HIVE-14038
> URL: https://issues.apache.org/jira/browse/HIVE-14038
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, 
> HIVE-14038.4.patch, HIVE-14038.patch
>
>
> 1. fix thread name inHouseKeeperServiceBase (currently they are all 
> "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0")
> 2. dump metastore configs from HiveConf on start up to help record values of 
> properties
> 3. add some tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14055) directSql - getting the number of partitions is broken

2016-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340243#comment-15340243
 ] 

Sergio Peña commented on HIVE-14055:


What about throwing a checked exception? I agree with you of invalid values. I 
see that a NULL value can be returned in case the filter couldn't be formed 
correctly. If NULL means an error, then a checked exception should be better 
handled by developers, shouldn't it?

> directSql - getting the number of partitions is broken
> --
>
> Key: HIVE-14055
> URL: https://issues.apache.org/jira/browse/HIVE-14055
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14055.patch
>
>
> Noticed while looking at something else. If the filter cannot be pushed down 
> it just returns 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13441) LLAPIF: security and signed fragments

2016-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340226#comment-15340226
 ] 

Sergey Shelukhin commented on HIVE-13441:
-

[~jdere] is there the documentation for external access? We could add some 
information there

> LLAPIF: security and signed fragments
> -
>
> Key: HIVE-13441
> URL: https://issues.apache.org/jira/browse/HIVE-13441
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: llap
> Fix For: 2.2.0
>
>
> Allows external clients to get securely signed splits from HS2, and submit 
> them to LLAP without running as a privileged user; LLAP will verify the 
> splits before running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >