[jira] [Commented] (HIVE-15741) Faster unsafe byte array comparisons

2017-01-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842376#comment-15842376
 ] 

Gopal V commented on HIVE-15741:


[~teddy.choi]: the StringExpr::equal() was written to be much faster for the 
common case where the strings aren't equal.

The most common case for long strings are URLs are which are usually suffix 
different and differ in length (len != len & last byte is different).

This patch prevents that from taking a fast-path (added to support common-crawl 
and similar clickstream data-streams).

The Unsafe code is only faster if the strings are equal or differ in the prefix.

> Faster unsafe byte array comparisons
> 
>
> Key: HIVE-15741
> URL: https://issues.apache.org/jira/browse/HIVE-15741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-15741.1.patch
>
>
> Byte array comparison is heavily used in joins and string conditions. Pure 
> Java implementation is simple but not performant. An implementation with 
> Unsafe#getLong is much faster. It's already implemented in 
> org.apache.hadoop.io.WritableComparator#compare. The WritableComparator class 
> handles exceptional cases, including a different endian and no access to 
> Unsafe, and it was used for many years in production.
> This patch will replace pure Java byte array comparisons with safe and faster 
> unsafe ones to get more performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15669) LLAP: Improve aging in shortest job first scheduler

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842371#comment-15842371
 ] 

Hive QA commented on HIVE-15669:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849060/HIVE-15669.3.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10988 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=114)

[bucketmapjoin3.q,load_dyn_part5.q,union_date.q,cbo_gby.q,auto_join31.q,auto_sortmerge_join_1.q,join_cond_pushdown_unqual1.q,ppd_outer_join3.q,bucket_map_join_spark3.q,union28.q,statsfs.q,escape_sortby1.q,leftsemijoin.q,union_remove_6.q,join29.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3209/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3209/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3209/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849060 - PreCommit-HIVE-Build

> LLAP: Improve aging in shortest job first scheduler
> ---
>
> Key: HIVE-15669
> URL: https://issues.apache.org/jira/browse/HIVE-15669
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15669.1.patch, HIVE-15669.1.patch, 
> HIVE-15669.2.patch, HIVE-15669.3.patch
>
>
> Under high concurrency, some jobs can gets starved for longer time when 
> hive.llap.task.scheduler.locality.delay is set to -1 (infinitely wait for 
> locality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15473) Progress Bar on Beeline client

2017-01-26 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842353#comment-15842353
 ] 

anishek edited comment on HIVE-15473 at 1/27/17 7:16 AM:
-

There are few observations / limitations that [~thejas] had cited while 
reviewing this. Writing down the reasoning here and steps of how we can move 
forward.

Given that we use SynchronizedHandler for the client on beeline side, only one 
operation / api at a time can be in execution from a single beeline session to 
hiveserver2. Current flow of how the progress bar is updated on the client side 
is 

Thread 1 -- does statement execution: This is achieved by calling 
GetOperationStatus for the operation from beeline till the execution of the 
operation is complete. The server side implementation of GetOperationStatus 
uses a timeout mechanism (which waits for the query execution to finish), 
before it sends the status to the client. The time value is decided by a step 
function, where for long running queries this can lead to a approx wait time of 
5 seconds per call to GetOperationStatus .
Thread 2 -- prints query Logs and progress logs.

*Problem Space:*
# Since the client synchronizes the various api calls, This effectively means 
that only one api from either Thread 1 / Thread 2 is executed at at time and 
the notion of trying to project concurrent execution capability in code for 
beeline seems misleading and hence with the current patch the progress bar /  
query log updates can be delayed by at least 5+ seconds ( _I dont think we can 
avoid this anyways, as i will discuss later_ ). 
# Additionally, since there is no *order* of threads requesting synchronization 
on a object is maintained, there is a possibility that Thread 1 can get the 
next lock on the object without Thread 2 getting a chance to obtain the lock, 
thus leading to long delays in updating the Query Log or Progress log ( _I am 
not sure how this will happen for use case of long running queries as while 
Thread 1 is executing , Thread 2 would already have blocked on the synchronize 
of the object. Once Thread 1 completes and before it comes around the while 
loop in_   
{code}
HiveStatement.waitForOperationToComplete()
{code}
_Thread 2 should start executing, it seems highly improbable that, thread 1 
completes and executes additional statements and gets the lock again before 
Thread 2 gets a chance to acquire the lock_ )

So in summary:
* Prevent multi threaded code in beeline for interactions with hiveserver2, as 
no concurrency is supported by the Thrift protocol, unless we move to 
ThriftHttpCliService using Http based connection, or use NonBlockingThrift 
server for binary protocol on the server side.
* Address the issue of responsiveness if we can.

*Solution Space:*
Since concurrent execution is not supported programming anything, to that 
effect should be avoided in beeline client. Hence, we strive to remove the 
multi threaded code from beeline side, in effect, moving the query log and 
progress bar log to merge with the GetOperationStatus api. This would still not 
address the issue of responsiveness as indicated in 1. above as the 
GetOperationStatus will use the wait time before responding to calls from 
beeline side, unless we decide to remove this, or reduce the wait time to a 
default value of say 500 milliseconds, not sure why the step function is used 
-- _to prevent server from wasting CPU resources on non-critical operations ?_ 
. This will address 2. above though since we are going to get all the 
information in a single call. 

*Implementation Considerations:*
# Merge QueryLog and ProgressBarLog request / response as part of 
GetOperationStatus.
# To get this working we have to extend HiveStatement to include few non JDBC 
compliant setters ( one interface for displaying progress bar, other for 
displaying query logs) -- default implementations for these will be _do 
nothing_ implementations
# Have setters on hive statement for both the interfaces, used by beeline to 
provide required implementations.
# As part of hive statement execute(*) call, we create appropriate request if 
custom implementations of the interfaces are provided above. 
# There will be additional function signature for GetOperationStatus that we 
might need to create to allow for backward compatibility reasons.
# _Not related to above_ : make sure we pass the vertex progress as string (for 
progress bar display) and query progress as custom enum for decision making(and 
implementations on server side to map from execution engine based state to our 
generic enum state).
 
If we are too worried about the responsiveness of the progress bar, or *2. in 
Problem Space* being a major impediment for hive usage, we should go with the 
new implementation proposal, else we just additionally implement *6. in 
Implementation Considerations*




was (Author: anishek):
The

[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842355#comment-15842355
 ] 

Gopal V commented on HIVE-15743:


The CPU cache counters for this issue is available here 

http://people.apache.org/~gopalv/llap-perf.tar.bz2

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15473) Progress Bar on Beeline client

2017-01-26 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842353#comment-15842353
 ] 

anishek edited comment on HIVE-15473 at 1/27/17 7:16 AM:
-

There are few observations / limitations that [~thejas] had cited while 
reviewing this. Writing down the reasoning here and steps of how we can move 
forward.

Given that we use SynchronizedHandler for the client on beeline side, only one 
operation / api at a time can be in execution from a single beeline session to 
hiveserver2. Current flow of how the progress bar is updated on the client side 
is 

Thread 1 -- does statement execution: This is achieved by calling 
GetOperationStatus for the operation from beeline till the execution of the 
operation is complete. The server side implementation of GetOperationStatus 
uses a timeout mechanism (which waits for the query execution to finish), 
before it sends the status to the client. The time value is decided by a step 
function, where for long running queries this can lead to a approx wait time of 
5 seconds per call to GetOperationStatus .
Thread 2 -- prints query Logs and progress logs.

*Problem Space:*
# Since the client synchronizes the various api calls, This effectively means 
that only one api from either Thread 1 / Thread 2 is executed at at time and 
the notion of trying to project concurrent execution capability in code for 
beeline seems misleading and hence with the current patch the progress bar /  
query log updates can be delayed by at least 5+ seconds ( _I dont think we can 
avoid this anyways, as i will discuss later_ ). 
# Additionally, since there is no *order* of threads requesting synchronization 
on a object is maintained, there is a possibility that Thread 1 can get the 
next lock on the object without Thread 2 getting a chance to obtain the lock, 
thus leading to long delays in updating the Query Log or Progress log ( _I am 
not sure how this will happen for use case of long running queries as while 
Thread 1 is executing , Thread 2 would already have blocked on the synchronize 
of the object. Once Thread 1 completes and before it comes around the while 
loop in_   
{code}
HiveStatement.waitForOperationToComplete()
{code}
_Thread 2 should start executing, it seems highly improbable that, thread 1 
completes and executes additional statements and gets the lock again before 
Thread 2 gets a chance to acquire the lock_ )

So in summary:
* Prevent multi threaded code in beeline for interactions with hiveserver2, as 
no concurrency is supported by the Thrift protocol, unless we move to 
ThriftHttpCliService using Http based connection, or use NonBlockingThrift 
server for binary protocol on the server side.
* Address the issue of responsiveness if we can.

*Solution Space:*
Since concurrent execution is not supported programming anything, to that 
effect should be avoided in beeline client. Hence, we strive to remove the 
multi threaded code from beeline side, in effect, moving the query log and 
progress bar log to merge with the GetOperationStatus api. This would still not 
address the issue of responsiveness as indicated in 1. above as the 
GetOperationStatus will use the wait time before responding to calls from 
beeline side, unless we decide to remove this, or reduce the wait time to a 
default value of say 500 milliseconds, not sure why the step function is used 
-- _to prevent server from wasting CPU resources on non-critical operations ?_ 
. This will address 2. above though since we are going to get all the 
information in a single call. 

*Implementation Considerations:*
# Merge QueryLog and ProgressBarLog request / response as part of 
GetOperationStatus.
# To get this working we have to extend HiveStatement to include few non JDBC 
compliant setters ( one interface for displaying progress bar, other for 
displaying query logs) -- default implementations for these will be _do 
nothing_ implementations
# Have setters on hive statement for both the interfaces, used by beeline to 
provide required implementations.
# As part of hive statement execute(*) call, we create appropriate request if 
custom implementations of the interfaces are provided above. 
# There will be additional function signature for GetOperationStatus that we 
might need to create to allow for backward compatibility reasons.
# _Not related to above_ : make sure we pass the vertex progress as string (for 
progress bar display) and query progress as custom enum for decision making(and 
implementations on server side to map from execution engine based state to our 
generic enum state).
 
If we are too worried about the responsiveness of the progress bar, or *2. in 
Problem Space* being a major impediment for hive usage, we should go with the 
new implementation proposal else just additionally implement with *5. in 
Implementation Considerations*




was (Author: anishek):
Th

[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2017-01-26 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842353#comment-15842353
 ] 

anishek commented on HIVE-15473:


There are few observations / limitations that [~thejas] had cited while 
reviewing this. Writing down the reasoning here and steps of how we can move 
forward.

Given that we use SynchronizedHandler for the client on beeline side, only one 
operation / api at a time can be in execution from a single beeline session to 
hiveserver2. Current flow of how the progress bar is updated on the client side 
is 

Thread 1 -- does statement execution: This is achieved by calling 
GetOperationStatus for the operation from beeline till the execution of the 
operation is complete. The server side implementation of GetOperationStatus 
uses a timeout mechanism (which waits for the query execution to finish), 
before it sends the status to the client. The time value is decided by a step 
function, where for long running queries this can lead to a approx wait time of 
5 seconds per call to GetOperationStatus .
Thread 2 -- prints query Logs and progress logs.

*Problem Space:*
# Since the client synchronizes the various api calls, This effectively means 
that only one api from either Thread 1 / Thread 2 is executed at at time and 
the notion of trying to project concurrent execution capability in code for 
beeline seems misleading and hence with the current patch the progress bar /  
query log updates can be delayed by at least 5+ seconds ( _I dont think we can 
avoid this anyways, as i will discuss later_ ). 
# Additionally, since there is no *order* of threads requesting synchronization 
on a object is maintained, there is a possibility that Thread 1 can get the 
next lock on the object without Thread 2 getting a chance to obtain the lock, 
thus leading to long delays in updating the Query Log or Progress log ( _I am 
not sure how this will happen for use case of long running queries as while 
Thread 1 is executing , Thread 2 would already have blocked on the synchronize 
of the object. Once Thread 1 completes and before it comes around the while 
loop in_   
{code}
HiveStatement.waitForOperationToComplete()
{code}
_Thread 2 should start executing, it seems highly improbable that, thread 1 
completes and executes additional statements and gets the lock again before 
Thread 2 gets a chance to acquire the lock_ )

So in summary:
* Prevent multi threaded code in beeline for interactions with hiveserver2, as 
no concurrency is supported by the Thrift protocol, unless we move to 
ThriftHttpCliService using Http based connection, or use NonBlockingThrift 
server for binary protocol on the server side.
* Address the issue of responsiveness if we can.

*Solution Space:*
Since concurrent execution is not supported programming anything, to that 
effect should be avoided in beeline client. Hence, we strive to remove the 
multi threaded code from beeline side, in effect, moving the query log and 
progress bar log to merge with the GetOperationStatus api. This would still not 
address the issue of responsiveness as indicated in 1. above as the 
GetOperationStatus will use the wait time before responding to calls from 
beeline side, unless we decide to remove this, or reduce the wait time to a 
default value of say 500 milliseconds, not sure why the step function is used 
-- _to prevent server from wasting CPU resources on non-critical operations ?_ 
. This will address 2. above though since we are going to get all the 
information in a single call. 

*Implementation Considerations:*
# Merge QueryLog and ProgressBarLog request / response as part of 
GetOperationStatus.
# To get this working we have to extend HiveStatement to include few non JDBC 
compliant setters ( one interface for displaying progress bar, other for 
displaying query logs) -- default implementations for these will be _do 
nothing_ implementations
# Have setters on hive statement for both the interfaces, used by beeline to 
provide required implementations.
# As part of hive statement execute(*) call, we create appropriate request if 
custom implementations of the interfaces are provided above. 
# _Not related to above_ : make sure we pass the vertex progress as string (for 
progress bar display) and query progress as custom enum for decision making(and 
implementations on server side to map from execution engine based state to our 
generic enum state).
 
If we are too worried about the responsiveness of the progress bar, or *2. in 
Problem Space* being a major impediment for hive usage, we should go with the 
new implementation proposal else just additionally implement with *5. in 
Implementation Considerations*



> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue 

[jira] [Commented] (HIVE-15708) Upgrade calcite version to 1.11

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842329#comment-15842329
 ] 

Hive QA commented on HIVE-15708:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849582/HIVE-15708.04.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 94 failed/errored test(s), 10987 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=122)

[auto_sortmerge_join_13.q,join4.q,join35.q,udf_percentile.q,join_reorder3.q,subquery_in.q,auto_join19.q,stats14.q,vectorization_15.q,union7.q,vectorization_nested_udf.q,vector_groupby_3.q,vectorized_ptf.q,auto_join2.q,groupby1_map_skew.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ambiguitycheck] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_select] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_timestamp] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast1] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast_on_constant] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_outer_join_ppr] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_udaf_percentile_approx_23]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_cast] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constantfolding] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog2] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_2] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_intervals] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_timeseries] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_cond_pushdown] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fold_eq_with_case_when] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fouter_join_ppr] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_unused] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_merging] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[louter_join_ppr] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoins] (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ops_comparison] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_char] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[outer_join_ppr] 
(batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_date] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_timestamp] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_varchar] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_type_check] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_type_in_plan] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_outer_join1] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[router_join_ppr] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp] (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_comparison2] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[type_conversions_1] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_percentile_approx_23]
 (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf3] (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_hour] (batch

[jira] [Commented] (HIVE-15717) Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to exception handling

2017-01-26 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842313#comment-15842313
 ] 

Tao Li commented on HIVE-15717:
---

[~thejas] Just submitted another iteration.

> Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to 
> exception handling
> ---
>
> Key: HIVE-15717
> URL: https://issues.apache.org/jira/browse/HIVE-15717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-15717.1.patch, HIVE-15717.2.patch, Screen Shot 
> 2017-01-24 at 3.11.09 PM.png, Screen Shot 2017-01-24 at 3.15.14 PM.png
>
>
> The exception handling from the 3 methods calls (rowDeleted, rowInserted and 
> rowUpdated). The implementation of these methods in 
> org.apache.hive.jdbc.HiveBaseResultSet class is just throwing 
> SQLException("Method not supported”), i.e. no real implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15717) Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to exception handling

2017-01-26 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-15717:
--
Attachment: HIVE-15717.2.patch

> Class "org.apache.hive.beeline.Rows.Row" constructor is CPU consuming due to 
> exception handling
> ---
>
> Key: HIVE-15717
> URL: https://issues.apache.org/jira/browse/HIVE-15717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-15717.1.patch, HIVE-15717.2.patch, Screen Shot 
> 2017-01-24 at 3.11.09 PM.png, Screen Shot 2017-01-24 at 3.15.14 PM.png
>
>
> The exception handling from the 3 methods calls (rowDeleted, rowInserted and 
> rowUpdated). The implementation of these methods in 
> org.apache.hive.jdbc.HiveBaseResultSet class is just throwing 
> SQLException("Method not supported”), i.e. no real implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15550) fix arglist logging in schematool

2017-01-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15550:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> fix arglist logging in schematool
> -
>
> Key: HIVE-15550
> URL: https://issues.apache.org/jira/browse/HIVE-15550
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15550.1.patch, HIVE-15550.1.patch
>
>
> In DEBUG mode schemaTool prints the password to log file.
> This is also seen if the user includes --verbose option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14573) Vectorization: Implement StringExpr::find()

2017-01-26 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14573:
---
Assignee: Teddy Choi

> Vectorization: Implement StringExpr::find() 
> 
>
> Key: HIVE-14573
> URL: https://issues.apache.org/jira/browse/HIVE-14573
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>
> Currently, the LIKE expression implementation is a dumb StringExpr::equals() 
> loop.
> For an input of N bytes and a pattern of M bytes, this has the complexity of 
> ((N-M)*M), which is not an issue with small patterns or small inputs.
> The pattern matching is currently optimized for matches, while in clickstream 
> data the opposite is true in general.
> From the common crawl data, the following run will go through the same
> {code}
> select count(1) from uservisits_orc_data where useragent like "%Opera%" and 
> searchword LIKE "%fruit%";
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15550) fix arglist logging in schematool

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842300#comment-15842300
 ] 

Thejas M Nair commented on HIVE-15550:
--

The test failures are unrelated (there were previous runs with same patch to 
compare) and failed tests are either tracked in umbrella jira or cleared up.
Patch committed to master.
Thanks Anishek!

> fix arglist logging in schematool
> -
>
> Key: HIVE-15550
> URL: https://issues.apache.org/jira/browse/HIVE-15550
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15550.1.patch, HIVE-15550.1.patch
>
>
> In DEBUG mode schemaTool prints the password to log file.
> This is also seen if the user includes --verbose option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15666) Select query with view adds base table partition as direct input in spark engine

2017-01-26 Thread Niklaus Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15841905#comment-15841905
 ] 

Niklaus Xiao commented on HIVE-15666:
-

[~aihuaxu] I tried 1.3.0-SNAPSHOT didn't see the issue when use MR engine, I'll 
try to use the latest version.

> Select query with view adds base table partition as direct input in spark 
> engine
> 
>
> Key: HIVE-15666
> URL: https://issues.apache.org/jira/browse/HIVE-15666
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0
>Reporter: Niklaus Xiao
>Assignee: Aihua Xu
> Attachments: TestViewEntityInSparkEngine.patch
>
>
> repro steps:
> {code}
> set hive.execution.engine=spark;
> create table base(id int) partitioned by (dt string);
> alter table base add partition(dt='2017');
> create view view1 as select * from base where id < 10;
> select * from view1;
> {code}
>  it requires the access not only for view1 but also for base@dt=2017 
> partition, which should not be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15711) Flaky TestEmbeddedThriftBinaryCLIService.testTaskStatus

2017-01-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15711:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

The change is test and the changed tests are passing. The other unrelated 
failures have cleared up in recent runs and others are tracked in umbrella jira.

Committed to master.
Thanks Anishek!

> Flaky TestEmbeddedThriftBinaryCLIService.testTaskStatus
> ---
>
> Key: HIVE-15711
> URL: https://issues.apache.org/jira/browse/HIVE-15711
> Project: Hive
>  Issue Type: Test
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15711.1.patch
>
>
> the above test is flaky and on the local build environments it keeps failing. 
> Fix to prevent it from failing intermittently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15712) new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2

2017-01-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15712:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

The failure in TestSparkCliDriver setup are unrelated, verified by running 
locally.
Thanks for the patch [~anishek]!


> new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2
> --
>
> Key: HIVE-15712
> URL: https://issues.apache.org/jira/browse/HIVE-15712
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-15712.1.patch, HIVE-15712.1.patch
>
>
> On doing internal performance test, with about 10 concurrent users we found 
> that about 18%  of CPU on hiveserver2 is spent in creation of new HiveConf() 
> in  SQLOperation.getSerDe().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15713) add ldap authentication related configuration to restricted list

2017-01-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15713:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to master.
Thanks for the patch [~anishek]

> add ldap authentication related configuration to restricted list
> 
>
> Key: HIVE-15713
> URL: https://issues.apache.org/jira/browse/HIVE-15713
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15713.1.patch, HIVE-15713.1.patch
>
>
> Various ldap configuration parameters as below should be added to the 
> restricted list of configuration parameters such that users cant change them 
> per session. 
> hive.server2.authentication.ldap.baseDN
> hive.server2.authentication.ldap.url
> hive.server2.authentication.ldap.Domain
> hive.server2.authentication.ldap.groupDNPattern
> hive.server2.authentication.ldap.groupFilter
> hive.server2.authentication.ldap.userDNPattern
> hive.server2.authentication.ldap.userFilter
> hive.server2.authentication.ldap.groupMembershipKey
> hive.server2.authentication.ldap.userMembershipKey
> hive.server2.authentication.ldap.groupClassKey
> hive.server2.authentication.ldap.customLDAPQuery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15712) new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15841008#comment-15841008
 ] 

Thejas M Nair commented on HIVE-15712:
--

Verifying the TestSparkCliDriver failures.
Rest of them are now accounted for in HIVE-15058

> new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2
> --
>
> Key: HIVE-15712
> URL: https://issues.apache.org/jira/browse/HIVE-15712
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-15712.1.patch, HIVE-15712.1.patch
>
>
> On doing internal performance test, with about 10 concurrent users we found 
> that about 18%  of CPU on hiveserver2 is spent in creation of new HiveConf() 
> in  SQLOperation.getSerDe().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15288) Flaky test: TestMiniTezCliDriver.testCliDriver[explainuser_3]

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840966#comment-15840966
 ] 

Thejas M Nair commented on HIVE-15288:
--

This is still failing - 

{code}
Running: diff -a 
/home/hiveptest/104.154.128.167-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/explainuser_3.q.out
 
/home/hiveptest/104.154.128.167-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/tez/explainuser_3.q.out
34c34
< Select Operator [SEL_7] (rows=16 width=107)
---
> Select Operator [SEL_7] (rows=16 width=106)
38c38
< Select Operator [SEL_5] (rows=16 width=107)
---
> Select Operator [SEL_5] (rows=16 width=106)
40c40
<   TableScan [TS_0] (rows=16 width=107)
---
>   TableScan [TS_0] (rows=16 width=106)

{code}

> Flaky test: TestMiniTezCliDriver.testCliDriver[explainuser_3]
> -
>
> Key: HIVE-15288
> URL: https://issues.apache.org/jira/browse/HIVE-15288
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Anthony Hsu
>Assignee: Pengcheng Xiong
>
> explainuser_3.q sometimes fails with the following diff:
> {noformat}
> 34c34
> < Select Operator [SEL_7] (rows=16 width=106)
> ---
> > Select Operator [SEL_7] (rows=16 width=107)
> 38c38
> < Select Operator [SEL_5] (rows=16 width=106)
> ---
> > Select Operator [SEL_5] (rows=16 width=107)
> 40c40
> <   TableScan [TS_0] (rows=16 width=106)
> ---
> >   TableScan [TS_0] (rows=16 width=107)
> {noformat}
> It was also previously reported as flaky in HIVE-14689.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-15288) Flaky test: TestMiniTezCliDriver.testCliDriver[explainuser_3]

2017-01-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reopened HIVE-15288:
--

> Flaky test: TestMiniTezCliDriver.testCliDriver[explainuser_3]
> -
>
> Key: HIVE-15288
> URL: https://issues.apache.org/jira/browse/HIVE-15288
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Anthony Hsu
>Assignee: Pengcheng Xiong
>
> explainuser_3.q sometimes fails with the following diff:
> {noformat}
> 34c34
> < Select Operator [SEL_7] (rows=16 width=106)
> ---
> > Select Operator [SEL_7] (rows=16 width=107)
> 38c38
> < Select Operator [SEL_5] (rows=16 width=106)
> ---
> > Select Operator [SEL_5] (rows=16 width=107)
> 40c40
> <   TableScan [TS_0] (rows=16 width=106)
> ---
> >   TableScan [TS_0] (rows=16 width=107)
> {noformat}
> It was also previously reported as flaky in HIVE-14689.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15699) Failing tests : encryption_join_with_different_encryption_keys, generatehfiles_require_family_path

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840962#comment-15840962
 ] 

Thejas M Nair commented on HIVE-15699:
--

[~pxiong]
testCliDriver_encryption_join_with_different_encryption_keys is still failing. 
Can you please take a look ?
Looks like the golden file hasn't been updated.

https://builds.apache.org/job/PreCommit-HIVE-Build/3204/testReport/org.apache.hadoop.hive.cli/TestEncryptedHDFSCliDriver/testCliDriver_encryption_join_with_different_encryption_keys_/

> Failing tests : encryption_join_with_different_encryption_keys, 
> generatehfiles_require_family_path
> --
>
> Key: HIVE-15699
> URL: https://issues.apache.org/jira/browse/HIVE-15699
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Pengcheng Xiong
>
> Failing for 84 test runs as of 
> https://builds.apache.org/job/PreCommit-HIVE-Build/3108/testReport/
>  
> org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
>   
>  
> org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15735) In some cases, view objects inside a view do not have parents

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840960#comment-15840960
 ] 

Hive QA commented on HIVE-15735:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849581/HIVE-15735.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11003 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3207/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3207/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3207/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849581 - PreCommit-HIVE-Build

> In some cases, view objects inside a view do not have parents
> -
>
> Key: HIVE-15735
> URL: https://issues.apache.org/jira/browse/HIVE-15735
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15735.1.patch
>
>
> This cause Sentry throws "No valid privileges" error:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges.
> To reproduce:
> Enable sentry:
> create table t1( i int);
> create view v1 as select * from t1;
> create view v2 as select * from v1 union all select * from v1;
> If the user does not have read permission on t1 and v1, the query
> select * from v2;  
> This will fail with:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges
>  User foo does not have privileges for QUERY
>  The required privileges: 
> Server=server1->Db=database2->Table=v1->action=select; 
> (state=42000,code=4)
> Sentry should not check v1's permission, for v1 has at least one parent(v2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-15699) Failing tests : encryption_join_with_different_encryption_keys, generatehfiles_require_family_path

2017-01-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reopened HIVE-15699:
--

> Failing tests : encryption_join_with_different_encryption_keys, 
> generatehfiles_require_family_path
> --
>
> Key: HIVE-15699
> URL: https://issues.apache.org/jira/browse/HIVE-15699
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Pengcheng Xiong
>
> Failing for 84 test runs as of 
> https://builds.apache.org/job/PreCommit-HIVE-Build/3108/testReport/
>  
> org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
>   
>  
> org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840955#comment-15840955
 ] 

Matt McCline commented on HIVE-15743:
-

Wow that is a high percentage!

Ya, I looked at some of those Java runtime classes recently (when I was doing 
FastDecimal) and it is doable.  All numeric related characters are UTF-8.   
There is considerable magic code around exponents, etc in them though but 
because of that I doubt few people are brave enough to change it.  So, our 
transformed version would be "relatively safe".

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15744) Flaky test: TestPerfCliDriver.query23, query14

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840947#comment-15840947
 ] 

Thejas M Nair commented on HIVE-15744:
--

It looks like the line where these warnings are being printed is moving around.


> Flaky test: TestPerfCliDriver.query23, query14
> --
>
> Key: HIVE-15744
> URL: https://issues.apache.org/jira/browse/HIVE-15744
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Thejas M Nair
>
> There is some flakiness in these tests -
> https://builds.apache.org/job/PreCommit-HIVE-Build/3206/testReport/org.apache.hadoop.hive.cli/TestPerfCliDriver/testCliDriver_query23_/
> https://builds.apache.org/job/PreCommit-HIVE-Build/3204/testReport/org.apache.hadoop.hive.cli/TestPerfCliDriver/testCliDriver_query14_/
> The diff looks like this - 
> {code}
> Running: diff -a 
> /home/hiveptest/130.211.230.155-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/query14.q.out
>  
> /home/hiveptest/130.211.230.155-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/perf/query14.q.out
> 0a1,2
> > Warning: Shuffle Join MERGEJOIN[916][tables = [$hdt$_1, $hdt$_2]] in Stage 
> > 'Reducer 114' is a cross product
> > Warning: Shuffle Join MERGEJOIN[917][tables = [$hdt$_1, $hdt$_2, $hdt$_0]] 
> > in Stage 'Reducer 115' is a cross product
> 5,6d6
> < Warning: Shuffle Join MERGEJOIN[916][tables = [$hdt$_1, $hdt$_2]] in Stage 
> 'Reducer 114' is a cross product
> < Warning: Shuffle Join MERGEJOIN[917][tables = [$hdt$_1, $hdt$_2, $hdt$_0]] 
> in Stage 'Reducer 115' is a cross product
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840937#comment-15840937
 ] 

Sergey Shelukhin edited comment on HIVE-15743 at 1/27/17 4:21 AM:
--

cc [~mmccline]

We can probably just c/p parts of FloatingDecimal - merge the parsing and 
doubleValue, and change them to operate on byte array. It only needs to 
recognize like 6-8 letters aside from normal numeric stuff (nan and infinity 
aside), so we should be safe since we always use utf8


was (Author: sershe):
cc [~mmccline]

We can probably just c/p parts of FloatingDecimal - merge the parsing and 
doubleValue, and change them to operate on byte array. It only needs to 
recognize like 6-8 letters aside from normal numeric stuff (nan and infinity 
aside), so we should be safe since we always use utf8

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840937#comment-15840937
 ] 

Sergey Shelukhin edited comment on HIVE-15743 at 1/27/17 4:21 AM:
--

cc [~mmccline]

We can probably just c/p parts of FloatingDecimal - merge the parsing and 
doubleValue, and change them to operate on byte array. It only needs to 
recognize like 6-8 letters aside from normal numeric stuff (nan and infinity 
aside), so we should be safe since we always use utf8


was (Author: sershe):
cc [~mmccline]

We can probably just c/p parts of FloatingDecimal - merge the parsing and 
doubleValue, and change them to operate on byte array. It only needs to 
recognize like 6-8 letters aside from normal numeric stuff, so we should be 
safe since we always use utf8

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15734) LazySimpleDeserializeRead.readField needs to catch IllegalArgumentException

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840935#comment-15840935
 ] 

Thejas M Nair commented on HIVE-15734:
--

As agreed in the dev mailing list and documented in the wiki, please follow 
this process if there are failed tests - 
https://cwiki.apache.org/confluence/display/Hive/HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
Specifically -
Test runs may not be clean due to issues in the patch itself, or due to 
flaky tests.
If the failure is identified to be a flaky test, before committing, cite 
the JIRA which covers the flaky test (tracked under HIVE-15058). Create a new 
one if a JIRA does not already exist.

> LazySimpleDeserializeRead.readField needs to catch IllegalArgumentException
> ---
>
> Key: HIVE-15734
> URL: https://issues.apache.org/jira/browse/HIVE-15734
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15734.01.patch
>
>
> E.g. java.sql.Date.valueOf can throw that exception if it encounters a parse 
> error for a date.
> With changes to CHAR padding, I think this may be why 
> schema_evol_text_vec_part.q is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840937#comment-15840937
 ] 

Sergey Shelukhin commented on HIVE-15743:
-

cc [~mmccline]

We can probably just c/p parts of FloatingDecimal - merge the parsing and 
doubleValue, and change them to operate on byte array. It only needs to 
recognize like 6-8 letters aside from normal numeric stuff, so we should be 
safe since we always use utf8

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840934#comment-15840934
 ] 

Thejas M Nair commented on HIVE-15546:
--

As agreed in the dev mailing list and documented in the wiki, please follow 
this process if there are failed tests - 
https://cwiki.apache.org/confluence/display/Hive/HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
Specifically -
Test runs may not be clean due to issues in the patch itself, or due to 
flaky tests.
If the failure is identified to be a flaky test, before committing, cite 
the JIRA which covers the flaky test (tracked under HIVE-15058). Create a new 
one if a JIRA does not already exist.

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.2.0
>
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch, HIVE-15546.6.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15629) Set DDLTask’s exception with its subtask’s exception

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840932#comment-15840932
 ] 

Thejas M Nair commented on HIVE-15629:
--

As agreed in the dev mailing list and documented in the wiki, please follow 
this process if there are failed tests - 
https://cwiki.apache.org/confluence/display/Hive/HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
Specifically -
Test runs may not be clean due to issues in the patch itself, or due to 
flaky tests.
If the failure is identified to be a flaky test, before committing, cite 
the JIRA which covers the flaky test (tracked under HIVE-15058). Create a new 
one if a JIRA does not already exist.

> Set DDLTask’s exception with its subtask’s exception
> 
>
> Key: HIVE-15629
> URL: https://issues.apache.org/jira/browse/HIVE-15629
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15629.000.patch
>
>
> Set DDLTask’s exception with its subtask’s exception, So the exception from 
> subtask in DDLTask can be propagated to TaskRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15743:

Attachment: tpch-without.png

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15485) Investigate the DoAs failure in HoS

2017-01-26 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15485:
---
Attachment: HIVE-15485.1.patch

Thanks [~jxiang]. Please take a look to see if it is you suggested. The keytab 
+ ";" do not have to be in two argv. Thanks

> Investigate the DoAs failure in HoS
> ---
>
> Key: HIVE-15485
> URL: https://issues.apache.org/jira/browse/HIVE-15485
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-15485.1.patch, HIVE-15485.patch
>
>
> With DoAs enabled, HoS failed with following errors:
> {code}
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> systest tries to renew a token with renewer hive
>   at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:674)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:999)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
> {code}
> It is related to the change from HIVE-14383. It looks like that SparkSubmit 
> logs in Kerberos with passed in hive principal/keytab and then tries to 
> create a hdfs delegation token for user systest with renewer hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15714) backport HIVE-11985 (and HIVE-12601) to branch-1

2017-01-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840921#comment-15840921
 ] 

Sergey Shelukhin commented on HIVE-15714:
-

I'll try to run them locally, but it looks like many others failed due to some 
unrelated reasons, like MapJoinMemory... etc

> backport HIVE-11985 (and HIVE-12601) to branch-1
> 
>
> Key: HIVE-15714
> URL: https://issues.apache.org/jira/browse/HIVE-15714
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15714-branch-1.patch
>
>
> Backport HIVE-11985 (and HIVE-12601) to branch-1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15714) backport HIVE-11985 (and HIVE-12601) to branch-1

2017-01-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840920#comment-15840920
 ] 

Sergey Shelukhin commented on HIVE-15714:
-

Many test failures are due to build issues, not timeouts
{noformat}
main:
 [exec] + /bin/pwd
 [exec] + BASE_DIR=./target
 [exec] + HIVE_ROOT=./target/../../../
 [exec] + DOWNLOAD_DIR=./../thirdparty
 [exec] + mkdir -p ./../thirdparty
 [exec] 
/home/hiveptest/130.211.215.128-hiveptest-1/apache-github-branch-1-source/itests/qtest-spark
 [exec] + download 
http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.5.0-bin-hadoop2-without-hive.tgz
 spark
 [exec] + 
url=http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.5.0-bin-hadoop2-without-hive.tgz
 [exec] + finalName=spark
 [exec] ++ basename 
http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.5.0-bin-hadoop2-without-hive.tgz
 [exec] + tarName=spark-1.5.0-bin-hadoop2-without-hive.tgz
 [exec] + rm -rf ./target/spark
 [exec] + [[ ! -f ./../thirdparty/spark-1.5.0-bin-hadoop2-without-hive.tgz 
]]
 [exec] + tar -zxf ./../thirdparty/spark-1.5.0-bin-hadoop2-without-hive.tgz 
-C ./target
 [exec] + mv ./target/spark-1.5.0-bin-hadoop2-without-hive ./target/spark
 [exec] + cp -f ./target/../../..//data/conf/spark/log4j.properties 
./target/spark/conf/
 [exec] + sed '/package /d' 
/data/hiveptest/working/apache-github-branch-1-source/itests/../contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleAdd.java
 [exec] sed: can't read 
/data/hiveptest/working/apache-github-branch-1-source/itests/../contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleAdd.java:
 No such file or directory
 [exec] + javac -cp 
/data/hiveptest/working/maven/org/apache/hive/hive-exec/1.3.0-SNAPSHOT/hive-exec-1.3.0-SNAPSHOT.jar
 /tmp/UDFExampleAdd.java -d /tmp
 [exec] + jar -cf /tmp/udfexampleadd-1.0.jar -C /tmp UDFExampleAdd.class
 [exec] /tmp/UDFExampleAdd.class : no such file or directory
[INFO] 
[INFO] BUILD FAILURE
{noformat}

> backport HIVE-11985 (and HIVE-12601) to branch-1
> 
>
> Key: HIVE-15714
> URL: https://issues.apache.org/jira/browse/HIVE-15714
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15714-branch-1.patch
>
>
> Backport HIVE-11985 (and HIVE-12601) to branch-1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15732) add the ability to restrict configuration for the queries submitted to HS2 (Tez pool)

2017-01-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15732:

Attachment: HIVE-15732.01.patch

Updated the patch to address the feedback

> add the ability to restrict configuration for the queries submitted to HS2 
> (Tez pool)
> -
>
> Key: HIVE-15732
> URL: https://issues.apache.org/jira/browse/HIVE-15732
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15732.01.patch, HIVE-15732.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15732) add the ability to restrict configuration for the queries submitted to HS2 (Tez pool)

2017-01-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840910#comment-15840910
 ] 

Sergey Shelukhin commented on HIVE-15732:
-

[~sseth] that's hive.conf.restricted.list, and is entirely immutable

> add the ability to restrict configuration for the queries submitted to HS2 
> (Tez pool)
> -
>
> Key: HIVE-15732
> URL: https://issues.apache.org/jira/browse/HIVE-15732
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15732.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15653) Some ALTER TABLE commands drop table stats

2017-01-26 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840902#comment-15840902
 ] 

Pengcheng Xiong commented on HIVE-15653:


LGTM, pending tests run. Thanks for your patch!

> Some ALTER TABLE commands drop table stats
> --
>
> Key: HIVE-15653
> URL: https://issues.apache.org/jira/browse/HIVE-15653
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
>Reporter: Alexander Behm
>Assignee: Chaoyu Tang
>Priority: Critical
> Attachments: HIVE-15653.1.patch, HIVE-15653.2.patch, 
> HIVE-15653.3.patch, HIVE-15653.4.patch, HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some 
> ALTER TABLE operations, but certainly not for others. Personally, I I think 
> ALTER TABLE should only change what was requested by the user without any 
> side effects that may be unclear to users. In particular, collecting stats 
> can be an expensive operation so it's rather inconvenient for users if they 
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_namedata_type   comment 
>
> i int 
>
> # Detailed Table Information   
> Database: default  
> Owner:abehm
> CreateTime:   Tue Jan 17 18:13:34 PST 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://localhost:20500/test-warehouse/t  
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   false   
>   last_modified_byabehm   
>   last_modified_time  1484705748  
>   numFiles1   
>   numRows -1  
>   rawDataSize -1  
>   testtest
>   totalSize   2   
>   transient_lastDdlTime   1484705748  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II

2017-01-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15672:

Status: Patch Available  (was: Open)

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II

2017-01-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15672:

Attachment: HIVE-15672.02.patch

Updated the async caching to not lock up the IO thread and instead do 
everything on that thread.

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15653) Some ALTER TABLE commands drop table stats

2017-01-26 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15653:
---
Attachment: HIVE-15653.4.patch

Additional changes to take care of COLUMN_STATS state in renaming a column. 
[~pxiong] could you review it? Currently when a column is renamed, its stats is 
deleted, so its COLUMN_STATS state should not exist. 
In columnStatsUpdateForStatsOptimizer_2.q, after ALTER TABLE calendar CHANGE 
year year1 INT, the stats state will become COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\"} because the old column "year" has been dropped but 
stats of the new column "year1" has not been created. Does it make sense?
Actually I am going to file a separate JIRA for the column rename, in which 
case, we should preserve its stats instead of deleting it.

> Some ALTER TABLE commands drop table stats
> --
>
> Key: HIVE-15653
> URL: https://issues.apache.org/jira/browse/HIVE-15653
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
>Reporter: Alexander Behm
>Assignee: Chaoyu Tang
>Priority: Critical
> Attachments: HIVE-15653.1.patch, HIVE-15653.2.patch, 
> HIVE-15653.3.patch, HIVE-15653.4.patch, HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some 
> ALTER TABLE operations, but certainly not for others. Personally, I I think 
> ALTER TABLE should only change what was requested by the user without any 
> side effects that may be unclear to users. In particular, collecting stats 
> can be an expensive operation so it's rather inconvenient for users if they 
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_namedata_type   comment 
>
> i int 
>
> # Detailed Table Information   
> Database: default  
> Owner:abehm
> CreateTime:   Tue Jan 17 18:13:34 PST 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://localhost:20500/test-warehouse/t  
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   false   
>   last_modified_byabehm   
>   last_modified_time  1484705748  
>   numFiles1   
>   numRows -1  
>   rawDataSize -1  
>   testtest
>   totalSize   2   
>   transient_lastDdlTime   1484705748  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15700) BytesColumnVector can get stuck trying to resize byte buffer

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840869#comment-15840869
 ] 

Hive QA commented on HIVE-15700:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849572/HIVE-15700.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11003 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[2]
 (batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3206/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3206/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3206/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849572 - PreCommit-HIVE-Build

> BytesColumnVector can get stuck trying to resize byte buffer
> 
>
> Key: HIVE-15700
> URL: https://issues.apache.org/jira/browse/HIVE-15700
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15700.1.patch, HIVE-15700.2.patch, 
> HIVE-15700.3.patch
>
>
> While looking at HIVE-15698, hit an issue where one of the reducers was stuck 
> in the following stack trace:
> {noformat}
> Thread 12735: (state = IN_JAVA)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(int)
>  @bci=22, line=245 (Compiled frame; information may be imprecise)
>  - org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(int, 
> byte[], int, int) @bci=18, line=150 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int, int, boolean) @bci=536, line=442 (Compiled frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int) @bci=110, line=761 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(org.apache.hadoop.io.BytesWritable,
>  java.lang.Iterable, byte) @bci=184, line=444 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector() 
> @bci=119, line=388 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() @bci=8, 
> line=239 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=124, 
> line=319 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
>  java.util.Map) @bci=30, line=185 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, 
> java.util.Map) @bci=159, line=168 (Interpreted frame)
>  - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, 
> line=370 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=133, line=73 
> (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=1, line=61 
> (Interpreted frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1724 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=38, 
> line=61 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=1, 
> line=37 (Interpreted frame)
>  - org.apache.tez.common.CallableWithNdc.call() @bci=8, line=36 (Interpreted 
> frame)
>  - java.

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-01-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Attachment: HIVE-15388.02.patch

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR `airports`.`airport` = 
> "Altus Municipal")
>  OR `airports`.`airport` = 
> "N

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-01-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Patch Available  (was: Open)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR `airports`.`airport` = 
> "Altus Municipal")
>  OR `airports`.`airport` = 

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-01-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Open  (was: Patch Available)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR `airports`.`airport` = 
> "Altus Municipal")
>  OR `airports`.`airport` = 

[jira] [Commented] (HIVE-15722) LLAP: Avoid marking a query as complete if the AMReporter runs into an error

2017-01-26 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840845#comment-15840845
 ] 

Prasanth Jayachandran commented on HIVE-15722:
--

lgtm, +1

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> 
>
> Key: HIVE-15722
> URL: https://issues.apache.org/jira/browse/HIVE-15722
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15722.01.patch, HIVE-15722.02.patch, 
> HIVE-15722.03.patch, HIVE-15722.04.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up 
> killing all fragments on the daemon. This is done by marking the query as 
> complete.
> The AM would continue to try scheduling on this node - which would lead to 
> task failures if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and 
> let a queryComplete call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not 
> simple though, since the AM will not find out what happened due to the 
> communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag 
> query16 already complete. Rejecting fragment [Map 7, 29, 0]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15722) LLAP: Avoid marking a query as complete if the AMReporter runs into an error

2017-01-26 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15722:
--
Attachment: HIVE-15722.04.patch

Fixed.

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> 
>
> Key: HIVE-15722
> URL: https://issues.apache.org/jira/browse/HIVE-15722
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15722.01.patch, HIVE-15722.02.patch, 
> HIVE-15722.03.patch, HIVE-15722.04.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up 
> killing all fragments on the daemon. This is done by marking the query as 
> complete.
> The AM would continue to try scheduling on this node - which would lead to 
> task failures if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and 
> let a queryComplete call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not 
> simple though, since the AM will not find out what happened due to the 
> communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag 
> query16 already complete. Rejecting fragment [Map 7, 29, 0]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15712) new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840839#comment-15840839
 ] 

Hive QA commented on HIVE-15712:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849565/HIVE-15712.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[skewjoinopt4] 
(batchId=106)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct]
 (batchId=106)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3205/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3205/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3205/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849565 - PreCommit-HIVE-Build

> new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2
> --
>
> Key: HIVE-15712
> URL: https://issues.apache.org/jira/browse/HIVE-15712
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-15712.1.patch, HIVE-15712.1.patch
>
>
> On doing internal performance test, with about 10 concurrent users we found 
> that about 18%  of CPU on hiveserver2 is spent in creation of new HiveConf() 
> in  SQLOperation.getSerDe().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM

2017-01-26 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15693:

Attachment: HIVE-15693.4.patch

Thanks [~sseth]. Will commit it after jenkin results.

> LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
> -
>
> Key: HIVE-15693
> URL: https://issues.apache.org/jira/browse/HIVE-15693
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, 
> HIVE-15693.3.patch, HIVE-15693.4.patch
>
>
> branch: master
> {noformat}
> 2017-01-22T19:52:42,470 WARN  [IPC Server handler 3 on 34642 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork 
> ...Call#17257 Retry#0
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77]
> at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>  ~[?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) 
> ~[?:1.8.0_77]
> at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15703:
---
Assignee: Vineet Garg  (was: Pengcheng Xiong)

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Vineet Garg
> Attachments: HIVE-15703.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15727) Add pre insert work to give storage handler the possibility to perform pre insert checking

2017-01-26 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15727:
--
Attachment: HIVE-15727.2.patch

> Add pre insert work to give storage handler the possibility to perform pre 
> insert checking
> --
>
> Key: HIVE-15727
> URL: https://issues.apache.org/jira/browse/HIVE-15727
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Fix For: 2.2.0
>
> Attachments: HIVE-15727.2.patch, HIVE-15727.patch
>
>
> Add pre insert work stage to give storage handler the possibility to perform 
> pre insert checking. For instance for the druid storage handler this will 
> block the statement INSERT INTO statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15741) Faster unsafe byte array comparisons

2017-01-26 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-15741:
--
Status: Patch Available  (was: Open)

> Faster unsafe byte array comparisons
> 
>
> Key: HIVE-15741
> URL: https://issues.apache.org/jira/browse/HIVE-15741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-15741.1.patch
>
>
> Byte array comparison is heavily used in joins and string conditions. Pure 
> Java implementation is simple but not performant. An implementation with 
> Unsafe#getLong is much faster. It's already implemented in 
> org.apache.hadoop.io.WritableComparator#compare. The WritableComparator class 
> handles exceptional cases, including a different endian and no access to 
> Unsafe, and it was used for many years in production.
> This patch will replace pure Java byte array comparisons with safe and faster 
> unsafe ones to get more performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15741) Faster unsafe byte array comparisons

2017-01-26 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-15741:
--
Attachment: HIVE-15741.1.patch

Created a patch.
- Internal byte array comparisons: ByteStream, LazySimpleDeserializeRead, 
WriteBuffers
- String comparisons: AbstractFilterStringColLikeStringScalar, StringExpr, 
UDFLike

> Faster unsafe byte array comparisons
> 
>
> Key: HIVE-15741
> URL: https://issues.apache.org/jira/browse/HIVE-15741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-15741.1.patch
>
>
> Byte array comparison is heavily used in joins and string conditions. Pure 
> Java implementation is simple but not performant. An implementation with 
> Unsafe#getLong is much faster. It's already implemented in 
> org.apache.hadoop.io.WritableComparator#compare. The WritableComparator class 
> handles exceptional cases, including a different endian and no access to 
> Unsafe, and it was used for many years in production.
> This patch will replace pure Java byte array comparisons with safe and faster 
> unsafe ones to get more performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15722) LLAP: Avoid marking a query as complete if the AMReporter runs into an error

2017-01-26 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840793#comment-15840793
 ] 

Prasanth Jayachandran commented on HIVE-15722:
--

finally block unlocks writeLock still..

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> 
>
> Key: HIVE-15722
> URL: https://issues.apache.org/jira/browse/HIVE-15722
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15722.01.patch, HIVE-15722.02.patch, 
> HIVE-15722.03.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up 
> killing all fragments on the daemon. This is done by marking the query as 
> complete.
> The AM would continue to try scheduling on this node - which would lead to 
> task failures if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and 
> let a queryComplete call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not 
> simple though, since the AM will not find out what happened due to the 
> communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag 
> query16 already complete. Rejecting fragment [Map 7, 29, 0]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-26 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840792#comment-15840792
 ] 

Pengcheng Xiong commented on HIVE-15703:


OK. then i will continue to investigate the failing tests.

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15703.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-01-26 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840789#comment-15840789
 ] 

Pengcheng Xiong commented on HIVE-15160:


[~vgarg], i will upload a new patch soon.

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release

2017-01-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14007:
-
Attachment: HIVE-14007.patch

Ok, this patch handles the recent changes that have gone into ORC and 
storage-api. It also addresses some test failures. The vast majority of the 
test differences are because ORC was adding extra bytes in some circumstances. 
With that bug fixed, the sizes of the files and streams inside the files change.

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15739) Incorrect exception message in PartExprEvalUtils

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840776#comment-15840776
 ] 

Hive QA commented on HIVE-15739:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849580/HIVE-15739.1.patch.txt

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10998 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=230)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3204/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3204/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3204/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849580 - PreCommit-HIVE-Build

> Incorrect exception message in PartExprEvalUtils
> 
>
> Key: HIVE-15739
> URL: https://issues.apache.org/jira/browse/HIVE-15739
> Project: Hive
>  Issue Type: Bug
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>Priority: Minor
> Attachments: HIVE-15739.1.patch.txt
>
>
> The check is on partSpec, not partProps:
> {noformat}
> if (partSpec.size() != partKeyTypes.length) {
> throw new HiveException("Internal error : Partition Spec size, " + 
> partProps.size() +
> " doesn't match partition key definition size, " + 
> partKeyTypes.length);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15722) LLAP: Avoid marking a query as complete if the AMReporter runs into an error

2017-01-26 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15722:
--
Attachment: HIVE-15722.03.patch

Good catch. Moved to a readLock for the getRegisteredFragments method.

Added a note to registerFragment on why a readlock is used. It's primarily to 
protect against concurrent execution with queryComplete. The data structure 
modifications internally are on concurrent structures, and meant to work with 
parallel registrations.

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> 
>
> Key: HIVE-15722
> URL: https://issues.apache.org/jira/browse/HIVE-15722
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15722.01.patch, HIVE-15722.02.patch, 
> HIVE-15722.03.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up 
> killing all fragments on the daemon. This is done by marking the query as 
> complete.
> The AM would continue to try scheduling on this node - which would lead to 
> task failures if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and 
> let a queryComplete call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not 
> simple though, since the AM will not find out what happened due to the 
> communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag 
> query16 already complete. Rejecting fragment [Map 7, 29, 0]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15472) JDBC: Standalone jar is missing ZK dependencies

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840762#comment-15840762
 ] 

Thejas M Nair commented on HIVE-15472:
--

I think master is fine. Most development is happening there.

> JDBC: Standalone jar is missing ZK dependencies
> ---
>
> Key: HIVE-15472
> URL: https://issues.apache.org/jira/browse/HIVE-15472
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Tao Li
> Fix For: 2.2.0
>
> Attachments: HIVE-15472.1.patch, HIVE-15472.2.patch
>
>
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/curator/RetryPolicy
>   at org.apache.hive.jdbc.Utils.configureConnParams(Utils.java:514)
>   at org.apache.hive.jdbc.Utils.parseURL(Utils.java:434)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:132)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at JDBCExecutor.getConnection(JDBCExecutor.java:65)
>   at JDBCExecutor.executeStatement(JDBCExecutor.java:104)
>   at JDBCExecutor.executeSQLFile(JDBCExecutor.java:81)
>   at JDBCExecutor.main(JDBCExecutor.java:183)
> Caused by: java.lang.ClassNotFoundException: org.apache.curator.RetryPolicy
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM

2017-01-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840758#comment-15840758
 ] 

Siddharth Seth commented on HIVE-15693:
---

+1. If we can override the maxThreads (based on numExecutors) - I think that 
should be mentioned in the description of the property before committing.

> LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
> -
>
> Key: HIVE-15693
> URL: https://issues.apache.org/jira/browse/HIVE-15693
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, 
> HIVE-15693.3.patch
>
>
> branch: master
> {noformat}
> 2017-01-22T19:52:42,470 WARN  [IPC Server handler 3 on 34642 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork 
> ...Call#17257 Retry#0
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77]
> at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>  ~[?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) 
> ~[?:1.8.0_77]
> at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15472) JDBC: Standalone jar is missing ZK dependencies

2017-01-26 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840725#comment-15840725
 ] 

Tao Li commented on HIVE-15472:
---

[~thejas] Thanks! Do we want to commit this change to hive1 as well?

> JDBC: Standalone jar is missing ZK dependencies
> ---
>
> Key: HIVE-15472
> URL: https://issues.apache.org/jira/browse/HIVE-15472
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Tao Li
> Fix For: 2.2.0
>
> Attachments: HIVE-15472.1.patch, HIVE-15472.2.patch
>
>
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/curator/RetryPolicy
>   at org.apache.hive.jdbc.Utils.configureConnParams(Utils.java:514)
>   at org.apache.hive.jdbc.Utils.parseURL(Utils.java:434)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:132)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at JDBCExecutor.getConnection(JDBCExecutor.java:65)
>   at JDBCExecutor.executeStatement(JDBCExecutor.java:104)
>   at JDBCExecutor.executeSQLFile(JDBCExecutor.java:81)
>   at JDBCExecutor.main(JDBCExecutor.java:183)
> Caused by: java.lang.ClassNotFoundException: org.apache.curator.RetryPolicy
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15713) add ldap authentication related configuration to restricted list

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840713#comment-15840713
 ] 

Hive QA commented on HIVE-15713:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849567/HIVE-15713.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10972 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=108)

[auto_join30.q,timestamp_null.q,union32.q,join16.q,groupby_ppr.q,bucketmapjoin7.q,smb_mapjoin_18.q,join19.q,vector_varchar_4.q,union6.q,cbo_subq_in.q,vectorization_part.q,sample8.q,vectorized_timestamp_funcs.q,join_star.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=120)

[groupby4_noskew.q,groupby3_map_skew.q,join_cond_pushdown_2.q,union19.q,union24.q,union_remove_5.q,groupby7_noskew_multi_single_reducer.q,vectorization_1.q,index_auto_self_join.q,auto_smb_mapjoin_14.q,script_env_var2.q,pcr.q,auto_join_filters.q,join0.q,join37.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3203/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3203/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3203/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849567 - PreCommit-HIVE-Build

> add ldap authentication related configuration to restricted list
> 
>
> Key: HIVE-15713
> URL: https://issues.apache.org/jira/browse/HIVE-15713
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15713.1.patch, HIVE-15713.1.patch
>
>
> Various ldap configuration parameters as below should be added to the 
> restricted list of configuration parameters such that users cant change them 
> per session. 
> hive.server2.authentication.ldap.baseDN
> hive.server2.authentication.ldap.url
> hive.server2.authentication.ldap.Domain
> hive.server2.authentication.ldap.groupDNPattern
> hive.server2.authentication.ldap.groupFilter
> hive.server2.authentication.ldap.userDNPattern
> hive.server2.authentication.ldap.userFilter
> hive.server2.authentication.ldap.groupMembershipKey
> hive.server2.authentication.ldap.userMembershipKey
> hive.server2.authentication.ldap.groupClassKey
> hive.server2.authentication.ldap.customLDAPQuery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15698) Vectorization support for min/max/bloomfilter runtime filtering

2017-01-26 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15698:
--
Attachment: HIVE-15698.3.patch

Updating patch per review comments

> Vectorization support for min/max/bloomfilter runtime filtering
> ---
>
> Key: HIVE-15698
> URL: https://issues.apache.org/jira/browse/HIVE-15698
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15698.1.patch, HIVE-15698.2.patch, 
> HIVE-15698.3.patch
>
>
> Enable vectorized execution for HIVE-15269.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-01-26 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840675#comment-15840675
 ] 

Vineet Garg commented on HIVE-15160:


[~pxiong] Your patch works only for non-cbo path?

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15740) Include hive-hcatalog-core.jar and hive-hcatalog-server-extensions.jar in binary distribution

2017-01-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15740:
--
Attachment: HIVE-15740.1.patch

> Include hive-hcatalog-core.jar and hive-hcatalog-server-extensions.jar in 
> binary distribution
> -
>
> Key: HIVE-15740
> URL: https://issues.apache.org/jira/browse/HIVE-15740
> Project: Hive
>  Issue Type: Bug
>  Components: distribution
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15740.1.patch
>
>
> Currently both jars are in hcatalog/share/hcatalog and not in classpath. 
> Metastore using DbNotificationListener will fail with CNF exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15740) Include hive-hcatalog-core.jar and hive-hcatalog-server-extensions.jar in binary distribution

2017-01-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15740:
--
Status: Patch Available  (was: Open)

> Include hive-hcatalog-core.jar and hive-hcatalog-server-extensions.jar in 
> binary distribution
> -
>
> Key: HIVE-15740
> URL: https://issues.apache.org/jira/browse/HIVE-15740
> Project: Hive
>  Issue Type: Bug
>  Components: distribution
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15740.1.patch
>
>
> Currently both jars are in hcatalog/share/hcatalog and not in classpath. 
> Metastore using DbNotificationListener will fail with CNF exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM

2017-01-26 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15693:

Status: Patch Available  (was: Open)

> LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
> -
>
> Key: HIVE-15693
> URL: https://issues.apache.org/jira/browse/HIVE-15693
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, 
> HIVE-15693.3.patch
>
>
> branch: master
> {noformat}
> 2017-01-22T19:52:42,470 WARN  [IPC Server handler 3 on 34642 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork 
> ...Call#17257 Retry#0
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77]
> at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>  ~[?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) 
> ~[?:1.8.0_77]
> at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM

2017-01-26 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15693:

Attachment: HIVE-15693.3.patch

Attaching .3 version of the patch.

> LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
> -
>
> Key: HIVE-15693
> URL: https://issues.apache.org/jira/browse/HIVE-15693
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch, 
> HIVE-15693.3.patch
>
>
> branch: master
> {noformat}
> 2017-01-22T19:52:42,470 WARN  [IPC Server handler 3 on 34642 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork 
> ...Call#17257 Retry#0
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77]
> at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>  ~[?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) 
> ~[?:1.8.0_77]
> at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM

2017-01-26 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15693:

Status: Open  (was: Patch Available)

> LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
> -
>
> Key: HIVE-15693
> URL: https://issues.apache.org/jira/browse/HIVE-15693
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch
>
>
> branch: master
> {noformat}
> 2017-01-22T19:52:42,470 WARN  [IPC Server handler 3 on 34642 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork 
> ...Call#17257 Retry#0
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77]
> at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>  ~[?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) 
> ~[?:1.8.0_77]
> at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM

2017-01-26 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15693:

Attachment: HIVE-15693.2.patch

Attaching .2 version addressing review comments.

> LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
> -
>
> Key: HIVE-15693
> URL: https://issues.apache.org/jira/browse/HIVE-15693
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15693.1.patch, HIVE-15693.2.patch
>
>
> branch: master
> {noformat}
> 2017-01-22T19:52:42,470 WARN  [IPC Server handler 3 on 34642 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork 
> ...Call#17257 Retry#0
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77]
> at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>  ~[?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) 
> ~[?:1.8.0_77]
> at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15724) getPrimaryKeys and getForeignKeys in metastore does not normalize db and table name

2017-01-26 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840629#comment-15840629
 ] 

Daniel Dai commented on HIVE-15724:
---

[~ashutoshc], mind reviewing this?

> getPrimaryKeys and getForeignKeys in metastore does not normalize db and 
> table name
> ---
>
> Key: HIVE-15724
> URL: https://issues.apache.org/jira/browse/HIVE-15724
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15724.1.patch, HIVE-15724.2.patch
>
>
> In db, everything is lower case. When we retrieve constraints back, we need 
> to normalize dbname/tablename. Otherwise, the following sample script fail:
> alter table Table9 add constraint pk1 primary key (a) disable novalidate;
> ALTER TABLE Table9 drop constraint pk1;
> Error message: InvalidObjectException(message:The constraint: pk1 does not 
> exist for the associated table: default.Table9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15693) LLAP: cached threadpool in AMReporter creates too many threads leading to OOM

2017-01-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840626#comment-15840626
 ] 

Siddharth Seth commented on HIVE-15693:
---

Maybe we can have -1/0 as value where we auto determine the thread count, and 
any other value being an override.

> LLAP: cached threadpool in AMReporter creates too many threads leading to OOM
> -
>
> Key: HIVE-15693
> URL: https://issues.apache.org/jira/browse/HIVE-15693
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15693.1.patch
>
>
> branch: master
> {noformat}
> 2017-01-22T19:52:42,470 WARN  [IPC Server handler 3 on 34642 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 3 on 34642, call 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork 
> ...Call#17257 Retry#0
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_77]
> at java.lang.Thread.start(Thread.java:714) [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>  ~[?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) 
> ~[?:1.8.0_77]
> at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.AMReporter.taskKilled(AMReporter.java:231)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl$KilledTaskHandlerImpl.taskKilled(ContainerRunnerImpl.java:501)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15713) add ldap authentication related configuration to restricted list

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840612#comment-15840612
 ] 

Hive QA commented on HIVE-15713:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849567/HIVE-15713.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3202/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3202/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3202/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849567 - PreCommit-HIVE-Build

> add ldap authentication related configuration to restricted list
> 
>
> Key: HIVE-15713
> URL: https://issues.apache.org/jira/browse/HIVE-15713
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15713.1.patch, HIVE-15713.1.patch
>
>
> Various ldap configuration parameters as below should be added to the 
> restricted list of configuration parameters such that users cant change them 
> per session. 
> hive.server2.authentication.ldap.baseDN
> hive.server2.authentication.ldap.url
> hive.server2.authentication.ldap.Domain
> hive.server2.authentication.ldap.groupDNPattern
> hive.server2.authentication.ldap.groupFilter
> hive.server2.authentication.ldap.userDNPattern
> hive.server2.authentication.ldap.userFilter
> hive.server2.authentication.ldap.groupMembershipKey
> hive.server2.authentication.ldap.userMembershipKey
> hive.server2.authentication.ldap.groupClassKey
> hive.server2.authentication.ldap.customLDAPQuery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15705) Event replication for constraints

2017-01-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15705:
--
Attachment: HIVE-15705.2.patch

Rebase with master.

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Attachment: HIVE-15680.4.patch

Fixed NPEs in LLAP tests, uploaded new patch, and updated 
[RB|https://reviews.apache.org/r/55816/].

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Patch Available  (was: Open)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-26 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15680:
---
Status: Open  (was: Patch Available)

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15669) LLAP: Improve aging in shortest job first scheduler

2017-01-26 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840584#comment-15840584
 ] 

Rajesh Balamohan commented on HIVE-15669:
-

Thanks for the patch [~prasanth_j]. 

LGTM. +1.


> LLAP: Improve aging in shortest job first scheduler
> ---
>
> Key: HIVE-15669
> URL: https://issues.apache.org/jira/browse/HIVE-15669
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15669.1.patch, HIVE-15669.1.patch, 
> HIVE-15669.2.patch, HIVE-15669.3.patch
>
>
> Under high concurrency, some jobs can gets starved for longer time when 
> hive.llap.task.scheduler.locality.delay is set to -1 (infinitely wait for 
> locality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15712) new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840538#comment-15840538
 ] 

Hive QA commented on HIVE-15712:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849565/HIVE-15712.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3201/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3201/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3201/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849565 - PreCommit-HIVE-Build

> new HiveConf in SQLOperation.getSerDe() impacts CPU on hiveserver2
> --
>
> Key: HIVE-15712
> URL: https://issues.apache.org/jira/browse/HIVE-15712
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-15712.1.patch, HIVE-15712.1.patch
>
>
> On doing internal performance test, with about 10 concurrent users we found 
> that about 18%  of CPU on hiveserver2 is spent in creation of new HiveConf() 
> in  SQLOperation.getSerDe().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15708) Upgrade calcite version to 1.11

2017-01-26 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-15708:

Attachment: HIVE-15708.04.patch

Josh's 04.patch

> Upgrade calcite version to 1.11
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15735) In some cases, view objects inside a view do not have parents

2017-01-26 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-15735:

Attachment: HIVE-15735.1.patch

Make sure a view inside another has a chance to assign a parent by doing the 
same parent search as ordinary table.

> In some cases, view objects inside a view do not have parents
> -
>
> Key: HIVE-15735
> URL: https://issues.apache.org/jira/browse/HIVE-15735
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15735.1.patch
>
>
> This cause Sentry throws "No valid privileges" error:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges.
> To reproduce:
> Enable sentry:
> create table t1( i int);
> create view v1 as select * from t1;
> create view v2 as select * from v1 union all select * from v1;
> If the user does not have read permission on t1 and v1, the query
> select * from v2;  
> This will fail with:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges
>  User foo does not have privileges for QUERY
>  The required privileges: 
> Server=server1->Db=database2->Table=v1->action=select; 
> (state=42000,code=4)
> Sentry should not check v1's permission, for v1 has at least one parent(v2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15735) In some cases, view objects inside a view do not have parents

2017-01-26 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-15735:

Status: Patch Available  (was: Open)

Need code review.

> In some cases, view objects inside a view do not have parents
> -
>
> Key: HIVE-15735
> URL: https://issues.apache.org/jira/browse/HIVE-15735
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15735.1.patch
>
>
> This cause Sentry throws "No valid privileges" error:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges.
> To reproduce:
> Enable sentry:
> create table t1( i int);
> create view v1 as select * from t1;
> create view v2 as select * from v1 union all select * from v1;
> If the user does not have read permission on t1 and v1, the query
> select * from v2;  
> This will fail with:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges
>  User foo does not have privileges for QUERY
>  The required privileges: 
> Server=server1->Db=database2->Table=v1->action=select; 
> (state=42000,code=4)
> Sentry should not check v1's permission, for v1 has at least one parent(v2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15708) Upgrade calcite version to 1.11

2017-01-26 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840517#comment-15840517
 ] 

Josh Elser commented on HIVE-15708:
---

bq.  I've tweaked the .03 patch and uploaded a .04 with a fix.

Just kidding. I apparently lost my karma to attach files on the HIVE project :)

http://home.apache.org/~elserj/HIVE-15708.04.patch

> Upgrade calcite version to 1.11
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15708) Upgrade calcite version to 1.11

2017-01-26 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840514#comment-15840514
 ] 

Josh Elser commented on HIVE-15708:
---

[~rusanu] had pinged me off-jira (as I'm behind some of  the changes in 
packaging in Avatica-1.9.0).

I think the errors that were being seen with .03 were related to duplicate 
avatica-core (non-shaded) and avatica(shaded) jars being included on the 
classpath. I've tweaked the .03 patch and uploaded a .04 with a fix.

I'm still seeing a test failure, but it seems like a legitimate test failure 
instead of a classpath issue :)

{noformat}
Running org.apache.hadoop.hive.cli.TestMinimrCliDriver
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 222.128 sec <<< 
FAILURE! - in org.apache.hadoop.hive.cli.TestMinimrCliDriver
testCliDriver[filter_cond_pushdown](org.apache.hadoop.hive.cli.TestMinimrCliDriver)
  Time elapsed: 3.371 sec  <<< FAILURE!
java.lang.AssertionError:
Unexpected exception java.lang.AssertionError: Client execution failed with 
error code = 10001 running

EXPLAIN
SELECT t1.key
FROM cbo_t1 t1
JOIN (
  SELECT t2.key
  FROM cbo_t2 t2
  JOIN (SELECT * FROM cbo_t3 t3 WHERE c_int=1) t3 ON t2.key=t3.c_int
  WHERE ((t2.key=t3.key) AND (t2.c_float + t3.c_float > 2)) OR
  ((t2.key=t3.key) AND (t2.c_int + t3.c_int > 2))) t4 ON 
t1.key=t4.keyfname=filter_cond_pushdown.q
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
at org.junit.Assert.fail(Assert.java:88)
at org.apache.hadoop.hive.ql.QTestUtil.failed(QTestUtil.java:2225)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:176)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver(TestMinimrCliDriver.java:59)
{noformat}

With a corresponding:

{noformat}
2017-01-26T13:35:58,930 ERROR [6bc10e68-b722-4d5c-9b9e-a7baa9315e0d main] 
parse.CalcitePlanner: org.apache.hadoop.hive.ql.parse.SemanticException: Line 
5:5 Table not found 'cbo_t1'
  at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1989)
  at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1917)
  at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10937)
  at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10988)
  at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:275)
  at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
  at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:129)
  at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:513)
  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1305)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1445)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1225)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1215)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
  at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1373)
  at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1347)
  at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
  at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
  at 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver(TestMinimrCliDriver.java:59)
{noformat}

> Upgrade calcite version to 1.11
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15739) Incorrect exception message in PartExprEvalUtils

2017-01-26 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840500#comment-15840500
 ] 

Mark Wagner commented on HIVE-15739:


Attached patch for master.

> Incorrect exception message in PartExprEvalUtils
> 
>
> Key: HIVE-15739
> URL: https://issues.apache.org/jira/browse/HIVE-15739
> Project: Hive
>  Issue Type: Bug
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>Priority: Minor
> Attachments: HIVE-15739.1.patch.txt
>
>
> The check is on partSpec, not partProps:
> {noformat}
> if (partSpec.size() != partKeyTypes.length) {
> throw new HiveException("Internal error : Partition Spec size, " + 
> partProps.size() +
> " doesn't match partition key definition size, " + 
> partKeyTypes.length);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15739) Incorrect exception message in PartExprEvalUtils

2017-01-26 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-15739:
---
Attachment: HIVE-15739.1.patch.txt

> Incorrect exception message in PartExprEvalUtils
> 
>
> Key: HIVE-15739
> URL: https://issues.apache.org/jira/browse/HIVE-15739
> Project: Hive
>  Issue Type: Bug
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>Priority: Minor
> Attachments: HIVE-15739.1.patch.txt
>
>
> The check is on partSpec, not partProps:
> {noformat}
> if (partSpec.size() != partKeyTypes.length) {
> throw new HiveException("Internal error : Partition Spec size, " + 
> partProps.size() +
> " doesn't match partition key definition size, " + 
> partKeyTypes.length);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15739) Incorrect exception message in PartExprEvalUtils

2017-01-26 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-15739:
---
Status: Patch Available  (was: Open)

> Incorrect exception message in PartExprEvalUtils
> 
>
> Key: HIVE-15739
> URL: https://issues.apache.org/jira/browse/HIVE-15739
> Project: Hive
>  Issue Type: Bug
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>Priority: Minor
> Attachments: HIVE-15739.1.patch.txt
>
>
> The check is on partSpec, not partProps:
> {noformat}
> if (partSpec.size() != partKeyTypes.length) {
> throw new HiveException("Internal error : Partition Spec size, " + 
> partProps.size() +
> " doesn't match partition key definition size, " + 
> partKeyTypes.length);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15736) Add unit tests to Utilities.getInputSummary() method for multi-threading cases

2017-01-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840483#comment-15840483
 ] 

Sergio Peña commented on HIVE-15736:


[~navis] [~ashutoshc] You worked and reviewed HIVE-3990, and seems it has a 
couple of bugs on the getInputSummary() regarding the InputEstimator. Could you 
help me review that part to see if the fix is correct? and if the lines 
commented on getInputSummary() on the patch were useful in the past? because 
seems they aren't, but it might be another bug.

> Add unit tests to Utilities.getInputSummary() method for multi-threading cases
> --
>
> Key: HIVE-15736
> URL: https://issues.apache.org/jira/browse/HIVE-15736
> Project: Hive
>  Issue Type: Test
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-15736.1.patch
>
>
> The {{Utilities.getInputSummary}} method has a configuration to use multiple 
> threads to get the content summary of tables and partitions. This 
> configuration variable, {{mapred.dfsclient.parallelism.max}}, is disabled by 
> default and there are no tests that validate the quality of using multi 
> threads.
> This JIRA is used to add tests to such method with multiple threads and fix 
> any issue found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15736) Add unit tests to Utilities.getInputSummary() method for multi-threading cases

2017-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840471#comment-15840471
 ] 

Hive QA commented on HIVE-15736:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849558/HIVE-15736.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11006 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3200/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3200/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3200/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849558 - PreCommit-HIVE-Build

> Add unit tests to Utilities.getInputSummary() method for multi-threading cases
> --
>
> Key: HIVE-15736
> URL: https://issues.apache.org/jira/browse/HIVE-15736
> Project: Hive
>  Issue Type: Test
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-15736.1.patch
>
>
> The {{Utilities.getInputSummary}} method has a configuration to use multiple 
> threads to get the content summary of tables and partitions. This 
> configuration variable, {{mapred.dfsclient.parallelism.max}}, is disabled by 
> default and there are no tests that validate the quality of using multi 
> threads.
> This JIRA is used to add tests to such method with multiple threads and fix 
> any issue found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840467#comment-15840467
 ] 

Ashutosh Chauhan commented on HIVE-15703:
-

HIVE-15737 will need HIVE-15708 which may not arrive soon. So, its good to get 
this in meanwhile. 

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15703.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15485) Investigate the DoAs failure in HoS

2017-01-26 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840465#comment-15840465
 ] 

Jimmy Xiang commented on HIVE-15485:


For doAs, add kinit etc to the beginning of the list; for the other add 
principal etc at the end. If you are concerned with performance, will 
LinkedList be better than ArrayList here?

By the way, should keyTabFile + ";" be two argvs?

> Investigate the DoAs failure in HoS
> ---
>
> Key: HIVE-15485
> URL: https://issues.apache.org/jira/browse/HIVE-15485
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-15485.patch
>
>
> With DoAs enabled, HoS failed with following errors:
> {code}
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> systest tries to renew a token with renewer hive
>   at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:674)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:999)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
> {code}
> It is related to the change from HIVE-14383. It looks like that SparkSubmit 
> logs in Kerberos with passed in hive principal/keytab and then tries to 
> create a hdfs delegation token for user systest with renewer hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15700) BytesColumnVector can get stuck trying to resize byte buffer

2017-01-26 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15700:
--
Attachment: HIVE-15700.3.patch

> BytesColumnVector can get stuck trying to resize byte buffer
> 
>
> Key: HIVE-15700
> URL: https://issues.apache.org/jira/browse/HIVE-15700
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15700.1.patch, HIVE-15700.2.patch, 
> HIVE-15700.3.patch
>
>
> While looking at HIVE-15698, hit an issue where one of the reducers was stuck 
> in the following stack trace:
> {noformat}
> Thread 12735: (state = IN_JAVA)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(int)
>  @bci=22, line=245 (Compiled frame; information may be imprecise)
>  - org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(int, 
> byte[], int, int) @bci=18, line=150 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int, int, boolean) @bci=536, line=442 (Compiled frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int) @bci=110, line=761 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(org.apache.hadoop.io.BytesWritable,
>  java.lang.Iterable, byte) @bci=184, line=444 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector() 
> @bci=119, line=388 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() @bci=8, 
> line=239 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=124, 
> line=319 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
>  java.util.Map) @bci=30, line=185 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, 
> java.util.Map) @bci=159, line=168 (Interpreted frame)
>  - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, 
> line=370 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=133, line=73 
> (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=1, line=61 
> (Interpreted frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1724 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=38, 
> line=61 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=1, 
> line=37 (Interpreted frame)
>  - org.apache.tez.common.CallableWithNdc.call() @bci=8, line=36 (Interpreted 
> frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {noformat}
> The reducer's input was 167 9MB binary values coming from the previous map 
> job. Per [~gopalv] the BytesColumnVector is stuck trying to reallocate/copy 
> all of these values into the same memory buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-26 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840416#comment-15840416
 ] 

Vineet Garg commented on HIVE-15703:


HIVE-15737 will get rid of {{HiveSubQRemoveRelBuilder}}

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15703.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15700) BytesColumnVector can get stuck trying to resize byte buffer

2017-01-26 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840415#comment-15840415
 ] 

Jason Dere commented on HIVE-15700:
---

Failure in schema_evol_text_vec_part looks like it may be due to HIVE-15734
TestVectorStringExpressions.testLoadBytesColumnVectorByValueLargeData needs an 
adjustment in the test - the new reallocation logic does allocate a new buffer 
but not as large as expected in the test.

> BytesColumnVector can get stuck trying to resize byte buffer
> 
>
> Key: HIVE-15700
> URL: https://issues.apache.org/jira/browse/HIVE-15700
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15700.1.patch, HIVE-15700.2.patch
>
>
> While looking at HIVE-15698, hit an issue where one of the reducers was stuck 
> in the following stack trace:
> {noformat}
> Thread 12735: (state = IN_JAVA)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(int)
>  @bci=22, line=245 (Compiled frame; information may be imprecise)
>  - org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(int, 
> byte[], int, int) @bci=18, line=150 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int, int, boolean) @bci=536, line=442 (Compiled frame)
>  - 
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
>  int) @bci=110, line=761 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(org.apache.hadoop.io.BytesWritable,
>  java.lang.Iterable, byte) @bci=184, line=444 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector() 
> @bci=119, line=388 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() @bci=8, 
> line=239 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=124, 
> line=319 (Interpreted frame)
>  - 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
>  java.util.Map) @bci=30, line=185 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, 
> java.util.Map) @bci=159, line=168 (Interpreted frame)
>  - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, 
> line=370 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=133, line=73 
> (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=1, line=61 
> (Interpreted frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1724 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=38, 
> line=61 (Interpreted frame)
>  - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=1, 
> line=37 (Interpreted frame)
>  - org.apache.tez.common.CallableWithNdc.call() @bci=8, line=36 (Interpreted 
> frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {noformat}
> The reducer's input was 167 9MB binary values coming from the previous map 
> job. Per [~gopalv] the BytesColumnVector is stuck trying to reallocate/copy 
> all of these values into the same memory buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-26 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840408#comment-15840408
 ] 

Vineet Garg commented on HIVE-15703:


HiveSubQRemoveRelBuilder kept default factories instead of Hive factories for a 
reason. I don't recall the exact reason but I remember wrong plans because of 
this.
Anyway we plan to get rid of this and replace with RelBuilder so I don't think 
it's worth the change.

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15703.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15711) Flaky TestEmbeddedThriftBinaryCLIService.testTaskStatus

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840398#comment-15840398
 ] 

Thejas M Nair commented on HIVE-15711:
--

+1


> Flaky TestEmbeddedThriftBinaryCLIService.testTaskStatus
> ---
>
> Key: HIVE-15711
> URL: https://issues.apache.org/jira/browse/HIVE-15711
> Project: Hive
>  Issue Type: Test
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15711.1.patch
>
>
> the above test is flaky and on the local build environments it keeps failing. 
> Fix to prevent it from failing intermittently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15713) add ldap authentication related configuration to restricted list

2017-01-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15713:
-
Assignee: anishek  (was: Thejas M Nair)

> add ldap authentication related configuration to restricted list
> 
>
> Key: HIVE-15713
> URL: https://issues.apache.org/jira/browse/HIVE-15713
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15713.1.patch, HIVE-15713.1.patch
>
>
> Various ldap configuration parameters as below should be added to the 
> restricted list of configuration parameters such that users cant change them 
> per session. 
> hive.server2.authentication.ldap.baseDN
> hive.server2.authentication.ldap.url
> hive.server2.authentication.ldap.Domain
> hive.server2.authentication.ldap.groupDNPattern
> hive.server2.authentication.ldap.groupFilter
> hive.server2.authentication.ldap.userDNPattern
> hive.server2.authentication.ldap.userFilter
> hive.server2.authentication.ldap.groupMembershipKey
> hive.server2.authentication.ldap.userMembershipKey
> hive.server2.authentication.ldap.groupClassKey
> hive.server2.authentication.ldap.customLDAPQuery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15734) LazySimpleDeserializeRead.readField needs to catch IllegalArgumentException

2017-01-26 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15734:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> LazySimpleDeserializeRead.readField needs to catch IllegalArgumentException
> ---
>
> Key: HIVE-15734
> URL: https://issues.apache.org/jira/browse/HIVE-15734
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15734.01.patch
>
>
> E.g. java.sql.Date.valueOf can throw that exception if it encounters a parse 
> error for a date.
> With changes to CHAR padding, I think this may be why 
> schema_evol_text_vec_part.q is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15734) LazySimpleDeserializeRead.readField needs to catch IllegalArgumentException

2017-01-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840386#comment-15840386
 ] 

Matt McCline commented on HIVE-15734:
-

Committed to master.

> LazySimpleDeserializeRead.readField needs to catch IllegalArgumentException
> ---
>
> Key: HIVE-15734
> URL: https://issues.apache.org/jira/browse/HIVE-15734
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15734.01.patch
>
>
> E.g. java.sql.Date.valueOf can throw that exception if it encounters a parse 
> error for a date.
> With changes to CHAR padding, I think this may be why 
> schema_evol_text_vec_part.q is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15713) add ldap authentication related configuration to restricted list

2017-01-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840383#comment-15840383
 ] 

Thejas M Nair commented on HIVE-15713:
--

+1
reattaching file to kick of tests once again.
build #3​20​2

> add ldap authentication related configuration to restricted list
> 
>
> Key: HIVE-15713
> URL: https://issues.apache.org/jira/browse/HIVE-15713
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15713.1.patch, HIVE-15713.1.patch
>
>
> Various ldap configuration parameters as below should be added to the 
> restricted list of configuration parameters such that users cant change them 
> per session. 
> hive.server2.authentication.ldap.baseDN
> hive.server2.authentication.ldap.url
> hive.server2.authentication.ldap.Domain
> hive.server2.authentication.ldap.groupDNPattern
> hive.server2.authentication.ldap.groupFilter
> hive.server2.authentication.ldap.userDNPattern
> hive.server2.authentication.ldap.userFilter
> hive.server2.authentication.ldap.groupMembershipKey
> hive.server2.authentication.ldap.userMembershipKey
> hive.server2.authentication.ldap.groupClassKey
> hive.server2.authentication.ldap.customLDAPQuery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >