[jira] [Commented] (HIVE-13424) Refactoring the code to pass a QueryState object rather than HiveConf object

2016-04-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231678#comment-15231678
 ] 

Hive QA commented on HIVE-13424:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797330/HIVE-13424.3.patch

{color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 389 failed/errored test(s), 9967 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_13
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucketpruning1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby_empty
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_windowing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_column_names_with_leading_and_trailing_spaces
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_constprog_dpp
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_constprog_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_create_merge_compressed
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_join

[jira] [Comment Edited] (HIVE-13457) Create HS2 REST API endpoints for monitoring information

2016-04-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231656#comment-15231656
 ] 

Thejas M Nair edited comment on HIVE-13457 at 4/8/16 5:21 AM:
--

Thats a great idea. 
I have also been thinking that creating and exposing codahale/dropwizard 
[health check|https://dropwizard.github.io/metrics/3.1.0/manual/healthchecks/] 
metrics would be very useful for monitoring. That way monitoring tools such as 
ambari can give an alert that includes the problem being faced.

For example for HS2 -
Health check  (OK/ Error message values)
 * metastorepresistence
 * filesystem
 * thread capacity
 * memory usage



was (Author: thejas):
Thats a great idea. 
I have also been thinking that creating and exposing codahale/dropwizard health 
check metrics would be very useful for monitoring. That way monitoring tools 
such as ambari can give an alert that includes the problem being faced.

For example for HS2 -
Health check  (OK/ Error message values)
 * metastorepresistence
 * filesystem
 * thread capacity
 * memory usage


> Create HS2 REST API endpoints for monitoring information
> 
>
> Key: HIVE-13457
> URL: https://issues.apache.org/jira/browse/HIVE-13457
> Project: Hive
>  Issue Type: Improvement
>Reporter: Szehon Ho
>
> Similar to what is exposed in HS2 webui in HIVE-12338, it would be nice if 
> other UI's like admin tools or Hue can access and display this information as 
> well.  Hence, we will create some REST endpoints to expose this information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13457) Create HS2 REST API endpoints for monitoring information

2016-04-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231656#comment-15231656
 ] 

Thejas M Nair commented on HIVE-13457:
--

Thats a great idea. 
I have also been thinking that creating and exposing codahale/dropwizard health 
check metrics would be very useful for monitoring. That way monitoring tools 
such as ambari can give an alert that includes the problem being faced.

For example for HS2 -
Health check  (OK/ Error message values)
 * metastorepresistence
 * filesystem
 * thread capacity
 * memory usage


> Create HS2 REST API endpoints for monitoring information
> 
>
> Key: HIVE-13457
> URL: https://issues.apache.org/jira/browse/HIVE-13457
> Project: Hive
>  Issue Type: Improvement
>Reporter: Szehon Ho
>
> Similar to what is exposed in HS2 webui in HIVE-12338, it would be nice if 
> other UI's like admin tools or Hue can access and display this information as 
> well.  Hence, we will create some REST endpoints to expose this information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13434) BaseSemanticAnalyzer.unescapeSQLString doesn't unescape \u0000 style character literals.

2016-04-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231584#comment-15231584
 ] 

Hive QA commented on HIVE-13434:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797313/HIVE-13434.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 9982 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityPreemption
org.apache.hive.hcatalog.mapreduce.TestHCatMultiOutputFormat.org.apache.hive.hcatalog.mapreduce.TestHCatMultiOutputFormat
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.org.apache.hive.service.TestHS2ImpersonationWithRemoteMS
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7506/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7506/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7506/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12797313 - PreCommit-HIVE-TRUNK-Build

> BaseSemanticAnalyzer.unescapeSQLString doesn't unescape \u style 
> character literals.
> 
>
> Key: HIVE-13434
> URL: https://issues.apache.org/jira/browse/HIVE-13434
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: HIVE-13434.1.patch
>
>
> BaseSemanticAnalyzer.unescapeSQLString method may have a fault. When "\u0061" 
> style character literals are passed to the method, it's not unescaped 
> successfully.
> In Spark SQL project, we referenced the unescaping logic and noticed this 
> issue (SPARK-14426)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13459) Cassandra Hive throws "Unable to find partitioner class 'org.apache.cassandra.dht.Murmur3Partitioner'"

2016-04-07 Thread yb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yb updated HIVE-13459:
--
Description: 
Using Hive trying to execute select statement on cassandra, but it throws error:
hive> select * from genericquantity;
OK
Failed with exception java.io.IOException:java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ConfigurationException: Unable to find 
partitioner class 'org.apache.cassandra.dht.Murmur3Partitioner'
Time taken: 0.518 seconds

middle:hive-1.2.0-hadoop-2.6.0-cassandra-2.1.6.jar

> Cassandra Hive throws "Unable to find partitioner class 
> 'org.apache.cassandra.dht.Murmur3Partitioner'"
> --
>
> Key: HIVE-13459
> URL: https://issues.apache.org/jira/browse/HIVE-13459
> Project: Hive
>  Issue Type: Bug
>Reporter: yb
>
> Using Hive trying to execute select statement on cassandra, but it throws 
> error:
> hive> select * from genericquantity;
> OK
> Failed with exception java.io.IOException:java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ConfigurationException: Unable to find 
> partitioner class 'org.apache.cassandra.dht.Murmur3Partitioner'
> Time taken: 0.518 seconds
> middle:hive-1.2.0-hadoop-2.6.0-cassandra-2.1.6.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6535) JDBC: async wait should happen during fetch for results

2016-04-07 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231519#comment-15231519
 ] 

Vaibhav Gumashta commented on HIVE-6535:


Discuss with [~thejas] and it makes sense to implement this as a non-standard 
api call. Looks like there is an expectation that Statement#execute be a 
blocking call.

> JDBC: async wait should happen during fetch for results
> ---
>
> Key: HIVE-6535
> URL: https://issues.apache.org/jira/browse/HIVE-6535
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0, 1.2.1, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-6535.1.patch, HIVE-6535.2.patch
>
>
> The hive jdbc client waits query completion during execute() call. It would 
> be better to block in the jdbc for completion when the results are being 
> fetched.
> This way the application using hive jdbc driver can do other tasks while 
> asynchronous query execution is happening, until it needs to fetch the result 
> set.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values

2016-04-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231496#comment-15231496
 ] 

Hive QA commented on HIVE-10176:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797309/HIVE-10176.10.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9978 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7505/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7505/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7505/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12797309 - PreCommit-HIVE-TRUNK-Build

> skip.header.line.count causes values to be skipped when performing insert 
> values
> 
>
> Key: HIVE-10176
> URL: https://issues.apache.org/jira/browse/HIVE-10176
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Wenbo Wang
>Assignee: Vladyslav Pavlenko
> Attachments: HIVE-10176.1.patch, HIVE-10176.10.patch, 
> HIVE-10176.2.patch, HIVE-10176.3.patch, HIVE-10176.4.patch, 
> HIVE-10176.5.patch, HIVE-10176.6.patch, HIVE-10176.7.patch, 
> HIVE-10176.8.patch, HIVE-10176.9.patch, data
>
>
> When inserting values in to tables with TBLPROPERTIES 
> ("skip.header.line.count"="1") the first value listed is also skipped. 
> create table test (row int, name string) TBLPROPERTIES 
> ("skip.header.line.count"="1"); 
> load data local inpath '/root/data' into table test;
> insert into table test values (1, 'a'), (2, 'b'), (3, 'c');
> (1, 'a') isn't inserted into the table. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13395) Lost Update problem in ACID

2016-04-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13395:
--
Priority: Blocker  (was: Critical)

> Lost Update problem in ACID
> ---
>
> Key: HIVE-13395
> URL: https://issues.apache.org/jira/browse/HIVE-13395
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> ACID users can run into Lost Update problem.
> In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for 
> the query) is called in Driver.compile().
> Now suppose to concurrent "update T set x = x + 1" are executed.  (for 
> simplicity assume there is exactly 1 row in T)
> What can happen is that both compile at the same time (more precisely before 
> acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in 
> the same snapshot, say the value of x = 7 in this snapshot.
> Now 1 will get the lock on the row, the second will block.  
> Now 1, makes x = 8 and commits.
> Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7.
> This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large 
> patch that deals with multi-statement txns) by moving recordValidTxns() after 
> locks are acquired which reduces the likelihood of this but doesn't eliminate 
> the problem.
> 
> Even in 1.3 version of the code, you could have the same issue.  Assume the 
> same 2 queries:
> Both start a txn, say txnid 9 and 10.  Say 10 gets the lock first, 9 blocks.
> 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10.
> 10 commits.
> Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will 
> see x = 8 and it will write x = 9, but it will set 
> ReaderKey.currentTransactionId = 9.  Thus when merge logic runs, it will see 
> x = 8 is the later version of this row, i.e. lost update.
> The problem is that locks alone are insufficient for MVCC architecture.  
> 
> At lower level Row ID has (originalTransactionId, rowid, bucket id, 
> currentTransactionId) and since on update/delete we do a table scan, we could 
> check that we are about to write a row with currentTransactionId < 
> (currentTransactionId of row we've read) and fail the query.  Currently, 
> currentTransactionId is not surfaced at higher level where this check can be 
> made.
> This would not work (efficiently) longer term where we want to support fast 
> update on user defined PK vis streaming ingest.
> Also, this would not work with multi statement txns since in that case we'd 
> lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries 
> would use the same snapshot and the locks for these queries would be acquired 
> after the snapshot is locked in so this would be the same situation as pre 
> HIVE-11077.
> 
>  
> A more robust solution (commonly used with MVCC) is to keep track of start 
> and commit time (logical counter) or each transaction to detect if two txns 
> overlap.  The 2nd part is to keep track of write-set, i.e. which data (rows, 
> partitions, whatever appropriate level of granularity is) were modified by 
> any txn and if 2 txns overlap in time and wrote the same element, abort later 
> one.  This is called first-committer-wins rule.  This requires a MS DB schema 
> change
> It would be most convenient to use the same sequence for txnId, start and 
> commit time (in which case txnid=start time).  In this case we'd need to add 
> 1 filed to TXNS table.  The complication here is that we'll be using elements 
> of the sequence faster and they are used as part of file name of delta and 
> base dir and currently limited to 7 digits which can be exceeded.  So this 
> would require some thought to handling upgrade/migration.
> Also, write-set tracking requires either additional metastore table or 
> keeping info in HIVE_LOCKS around longer with new state.
> 
> In the short term, on SQL side of things we could (in auto commit mode only)
> acquire the locks first and then open the txn AND update these locks with txn 
> id.
> This implies another Thrift change to pass in lockId to openTxn.
> The same would not work for Streaming API since it opens several txns at once 
> and then acquires locks for each.
> (Not sure if that's is an issue or not since Streaming only does Insert).
> Either way this feels hacky.
> 
> Here is one simple example why we need Write-Set tracking for multi-statement 
> txns
> Consider transactions T ~1~ and T ~2~:
> T ~1~: r ~1~\[x] -> w ~1~\[y] -> c ~1~ 
> T ~2~: w ~2~\[x] -> w ~2~\[y] -> c ~2~  
> Suppose the order of operations is r ~1~\[x] w ~2~\[x] then a 
> conventional R/W lock manager w/o MVCSS will block the write from T ~2~ 
> With MVCC we don't want 

[jira] [Updated] (HIVE-13187) hiveserver2 can suppress OOM errors in some cases

2016-04-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-13187:
--
Target Version/s: 2.1.0  (was: 2.0.1)

> hiveserver2 can suppress OOM errors in some cases
> -
>
> Key: HIVE-13187
> URL: https://issues.apache.org/jira/browse/HIVE-13187
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Priority: Critical
>
> Affects at least branch-2.
> See trace in https://issues.apache.org/jira/browse/HIVE-13176
> This looks to be in src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java.
> That catches Throwable in the thread and sends it further up. There's no 
> checks to see if this is an Error or general Exception - Errors end up 
> getting suppressed, instead of killing HiveServer2. This is on the processing 
> threads.
> It looks like the Handler threads have some kind of OOM checker on them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13176) OutOfMemoryError : GC overhead limit exceeded

2016-04-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-13176:
--
Target Version/s: 2.1.0  (was: 2.0.1)

> OutOfMemoryError :  GC overhead limit exceeded
> --
>
> Key: HIVE-13176
> URL: https://issues.apache.org/jira/browse/HIVE-13176
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Kavan Suresh
>Assignee: Siddharth Seth
> Attachments: dataNucleus.png, fs.png, shutdownhook.png
>
>
> Detected leaks while testing hiveserver2 concurrency setup with LLAP.
> 2016-02-26T12:50:58,131 ERROR [HiveServer2-Background-Pool: Thread-311030]: 
> operation.Operation (SQLOperation.java:run(230)) - Error running hive query:
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:333)
>  ~[hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:177)
>  ~[hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:73)
>  ~[hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:227)
>  [hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_45]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_45]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.1.2.3.5.1-36.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:239)
>  [hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_45]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13360) Refactoring Hive Authorization

2016-04-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231417#comment-15231417
 ] 

Thejas M Nair commented on HIVE-13360:
--

Regarding the change to move ip address from the query context object 
(HiveAuthzContext/QueryContext) to HiveAuthenticationProvider. I don't think 
that is the right place for it.​

In HS2 HTTP mode, when proxies and knox servers are between end user and HS2 , 
every request for single session does not have to come via a single IP address. 
Current assumption in hive code base is that the IP address is valid for the 
entire session. But that is more of a bug.

Also, HIVE-12777​ provides the ability to serialize the sessionhandle 
(equivalent to a jdbc connection identifier) and restore the session from that. 
The restoration could in theory happen from another machine with different IP 
address.

Considering this, the correct longer term place for passing the IP address to 
authorization plugins is using HiveAuthzContext/QueryContext.
Also, QueryContext is not the best name for the class as it passed for 
metastore api calls as well (HiveAuthorizer.filterListCmdObjects), IMO, 
something like "ActionContext" would be more appropriate.

However, I don't think its worth changing the name at the cost of changing the 
API.


> Refactoring Hive Authorization
> --
>
> Key: HIVE-13360
> URL: https://issues.apache.org/jira/browse/HIVE-13360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13360.01.patch, HIVE-13360.02.patch, 
> HIVE-13360.03.patch, HIVE-13360.04.patch, HIVE-13360.final.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4841) Add partition level hook to HiveMetaHook

2016-04-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231413#comment-15231413
 ] 

Hive QA commented on HIVE-4841:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12690295/HIVE-4841.4.patch.txt

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7504/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7504/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7504/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-7504/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   7e0b08c..fee6669  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 7e0b08c HIVE-13360: Refactoring Hive Authorization (Pengcheng 
Xiong, reviewed by Ashutosh Chauhan)
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/Driver.java.orig
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at fee6669 HIVE-1: StatsOptimizer throws ClassCastException 
(Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12690295 - PreCommit-HIVE-TRUNK-Build

> Add partition level hook to HiveMetaHook
> 
>
> Key: HIVE-4841
> URL: https://issues.apache.org/jira/browse/HIVE-4841
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4841.4.patch.txt, HIVE-4841.D11673.1.patch, 
> HIVE-4841.D11673.2.patch, HIVE-4841.D11673.3.patch
>
>
> Current HiveMetaHook provides hooks for tables only. With partition level 
> hook, external storages also could be revised to exploit PPR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13421) Propagate job progress in operation status

2016-04-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231409#comment-15231409
 ] 

Hive QA commented on HIVE-13421:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797272/HIVE-13421.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9980 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_incompat1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7503/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7503/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7503/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12797272 - PreCommit-HIVE-TRUNK-Build

> Propagate job progress in operation status
> --
>
> Key: HIVE-13421
> URL: https://issues.apache.org/jira/browse/HIVE-13421
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13421.01.patch, HIVE-13421.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13333) StatsOptimizer throws ClassCastException

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-1:
---
Fix Version/s: 2.1.0

> StatsOptimizer throws ClassCastException
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-1.01.patch, HIVE-1.02.patch, 
> HIVE-1.03.patch
>
>
> mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true 
> -Dqfile=cbo_rp_udf_udaf.q -Dhive.compute.query.using.stats=true repros the 
> issue.
> In StatsOptimizer with return path on, we may have aggr($f0), aggr($f1) in GBY
> and then select aggr($f1), aggr($f0) in SEL.
> Thus we need to use colExp to find out which position is
> corresponding to which position.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13333) StatsOptimizer throws ClassCastException

2016-04-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231385#comment-15231385
 ] 

Pengcheng Xiong commented on HIVE-1:


manually rerun all the failed tests, can not repo. Pushed to master. Thanks 
[~ashutoshc] for the review.

> StatsOptimizer throws ClassCastException
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-1.01.patch, HIVE-1.02.patch, 
> HIVE-1.03.patch
>
>
> mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true 
> -Dqfile=cbo_rp_udf_udaf.q -Dhive.compute.query.using.stats=true repros the 
> issue.
> In StatsOptimizer with return path on, we may have aggr($f0), aggr($f1) in GBY
> and then select aggr($f1), aggr($f0) in SEL.
> Thus we need to use colExp to find out which position is
> corresponding to which position.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13333) StatsOptimizer throws ClassCastException

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-1:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> StatsOptimizer throws ClassCastException
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-1.01.patch, HIVE-1.02.patch, 
> HIVE-1.03.patch
>
>
> mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true 
> -Dqfile=cbo_rp_udf_udaf.q -Dhive.compute.query.using.stats=true repros the 
> issue.
> In StatsOptimizer with return path on, we may have aggr($f0), aggr($f1) in GBY
> and then select aggr($f1), aggr($f0) in SEL.
> Thus we need to use colExp to find out which position is
> corresponding to which position.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13439) JDBC: provide a way to retrieve GUID to query Yarn ATS

2016-04-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231381#comment-15231381
 ] 

Thejas M Nair commented on HIVE-13439:
--

+1

> JDBC: provide a way to retrieve GUID to query Yarn ATS
> --
>
> Key: HIVE-13439
> URL: https://issues.apache.org/jira/browse/HIVE-13439
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-13439.1.patch, HIVE-13439.2.patch
>
>
> HIVE-9673 added support for passing base64 encoded operation handles to ATS. 
> We should a method on client side to retrieve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9660:
---
Attachment: HIVE-9660.07.patch

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.patch, 
> HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13341) Stats state is not captured correctly: differentiate load table and create table

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13341:
---
Status: Patch Available  (was: Open)

> Stats state is not captured correctly: differentiate load table and create 
> table
> 
>
> Key: HIVE-13341
> URL: https://issues.apache.org/jira/browse/HIVE-13341
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13341.01.patch, HIVE-13341.02.patch, 
> HIVE-13341.03.patch, HIVE-13341.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13341) Stats state is not captured correctly: differentiate load table and create table

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13341:
---
Attachment: HIVE-13341.04.patch

> Stats state is not captured correctly: differentiate load table and create 
> table
> 
>
> Key: HIVE-13341
> URL: https://issues.apache.org/jira/browse/HIVE-13341
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13341.01.patch, HIVE-13341.02.patch, 
> HIVE-13341.03.patch, HIVE-13341.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13341) Stats state is not captured correctly: differentiate load table and create table

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13341:
---
Status: Open  (was: Patch Available)

> Stats state is not captured correctly: differentiate load table and create 
> table
> 
>
> Key: HIVE-13341
> URL: https://issues.apache.org/jira/browse/HIVE-13341
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13341.01.patch, HIVE-13341.02.patch, 
> HIVE-13341.03.patch, HIVE-13341.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9660:
---
Attachment: HIVE-9660.07.patch

Addressing review comments. The biggest change is the index to kind change for 
lengths tracking

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13429) Tool to remove dangling scratch dir

2016-04-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13429:
--
Attachment: (was: HIVE-13429.2.patch)

> Tool to remove dangling scratch dir
> ---
>
> Key: HIVE-13429
> URL: https://issues.apache.org/jira/browse/HIVE-13429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch
>
>
> We have seen in some cases, user will leave the scratch dir behind, and 
> eventually eat out hdfs storage. This could happen when vm restarts and leave 
> no chance for Hive to run shutdown hook. This is applicable for both HiveCli 
> and HiveServer2. Here we provide an external tool to clear dead scratch dir 
> as needed.
> We need a way to identify which scratch dir is in use. We will rely on HDFS 
> write lock for that. Here is how HDFS write lock works:
> 1. A HDFS client open HDFS file for write and only close at the time of 
> shutdown
> 2. Cleanup process can try to open HDFS file for write. If the client holding 
> this file is still running, we will get exception. Otherwise, we know the 
> client is dead
> 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the 
> lease after 10 min, ie, the HDFS file hold by the dead client is writable 
> again after 10 min
> So here is how we remove dangling scratch directory in Hive:
> 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and 
> only close it when we about to drop scratch directory
> 2. A command line tool cleardanglingscratchdir  will check every scratch 
> directory and try open the lock file for write. If it does not get exception, 
> meaning the owner is dead and we can safely remove the scratch directory
> 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but 
> we still cannot reclaim the scratch directory for another 10 min. But this 
> should be tolerable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13429) Tool to remove dangling scratch dir

2016-04-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13429:
--
Attachment: HIVE-13429.2.patch

> Tool to remove dangling scratch dir
> ---
>
> Key: HIVE-13429
> URL: https://issues.apache.org/jira/browse/HIVE-13429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch
>
>
> We have seen in some cases, user will leave the scratch dir behind, and 
> eventually eat out hdfs storage. This could happen when vm restarts and leave 
> no chance for Hive to run shutdown hook. This is applicable for both HiveCli 
> and HiveServer2. Here we provide an external tool to clear dead scratch dir 
> as needed.
> We need a way to identify which scratch dir is in use. We will rely on HDFS 
> write lock for that. Here is how HDFS write lock works:
> 1. A HDFS client open HDFS file for write and only close at the time of 
> shutdown
> 2. Cleanup process can try to open HDFS file for write. If the client holding 
> this file is still running, we will get exception. Otherwise, we know the 
> client is dead
> 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the 
> lease after 10 min, ie, the HDFS file hold by the dead client is writable 
> again after 10 min
> So here is how we remove dangling scratch directory in Hive:
> 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and 
> only close it when we about to drop scratch directory
> 2. A command line tool cleardanglingscratchdir  will check every scratch 
> directory and try open the lock file for write. If it does not get exception, 
> meaning the owner is dead and we can safely remove the scratch directory
> 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but 
> we still cannot reclaim the scratch directory for another 10 min. But this 
> should be tolerable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13429) Tool to remove dangling scratch dir

2016-04-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13429:
--
Attachment: HIVE-13429.2.patch

> Tool to remove dangling scratch dir
> ---
>
> Key: HIVE-13429
> URL: https://issues.apache.org/jira/browse/HIVE-13429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch
>
>
> We have seen in some cases, user will leave the scratch dir behind, and 
> eventually eat out hdfs storage. This could happen when vm restarts and leave 
> no chance for Hive to run shutdown hook. This is applicable for both HiveCli 
> and HiveServer2. Here we provide an external tool to clear dead scratch dir 
> as needed.
> We need a way to identify which scratch dir is in use. We will rely on HDFS 
> write lock for that. Here is how HDFS write lock works:
> 1. A HDFS client open HDFS file for write and only close at the time of 
> shutdown
> 2. Cleanup process can try to open HDFS file for write. If the client holding 
> this file is still running, we will get exception. Otherwise, we know the 
> client is dead
> 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the 
> lease after 10 min, ie, the HDFS file hold by the dead client is writable 
> again after 10 min
> So here is how we remove dangling scratch directory in Hive:
> 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and 
> only close it when we about to drop scratch directory
> 2. A command line tool cleardanglingscratchdir  will check every scratch 
> directory and try open the lock file for write. If it does not get exception, 
> meaning the owner is dead and we can safely remove the scratch directory
> 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but 
> we still cannot reclaim the scratch directory for another 10 min. But this 
> should be tolerable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-04-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231208#comment-15231208
 ] 

Ashutosh Chauhan commented on HIVE-13452:
-

yeah.. difference is MySQL/Postgres treats constant in expressions for group by 
as positional reference in select list and in which case it doesnt make sense. 
In Hive, you can get either behavior by {{hive.groupby.orderby.position.alias}} 
config. However, important point here is even for queries like {{select 
count(*) from t1 group by c1;}} should return no resultset for empty table. 
group by 1 essentially mean.. treat all rows as one grouping, so in case for 
empty table group by 1 should return no rows and just select count(*) from t1 
should return row with value 0.

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13405) Fix Connection Leak in OrcRawRecordMerger

2016-04-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231199#comment-15231199
 ] 

Prasanth Jayachandran commented on HIVE-13405:
--

Yeah. This will work in jdk7 onwards. +1

> Fix Connection Leak in OrcRawRecordMerger
> -
>
> Key: HIVE-13405
> URL: https://issues.apache.org/jira/browse/HIVE-13405
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-13405.patch
>
>
> In OrcRawRecordMerger.getLastFlushLength, if the opened stream throws an 
> IOException on .available() or on .readLong(), the function will exit without 
> closing the stream.
> This patch adds a try-with-resources to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13282) GroupBy and select operator encounter ArrayIndexOutOfBoundsException

2016-04-07 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231193#comment-15231193
 ] 

Vikram Dixit K edited comment on HIVE-13282 at 4/7/16 10:05 PM:


Yes. We can move this out to 2.1.0. This only happens in case of reduce side 
SMB in tez. We have a simple workaround right now that will address this 
(disable smb join in this case). The real fix would take a lot of refactoring 
the code which is more suited for master than a maintenance release.


was (Author: vikram.dixit):
Yes. We can move this out to 2.1.0. This only happens in case of reduce side 
SMB in tez. We have a simple workaround right now that will address this. The 
real fix would take a lot of refactoring the code which is more suited for 
master than a maintenance release.

> GroupBy and select operator encounter ArrayIndexOutOfBoundsException
> 
>
> Key: HIVE-13282
> URL: https://issues.apache.org/jira/browse/HIVE-13282
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>
> The group by and select operators run into the ArrayIndexOutOfBoundsException 
> when they incorrectly initialize themselves with tag 0 but the incoming tag 
> id is different.
> {code}
> select count(*) from
> (select rt1.id from
> (select t1.key as id, t1.value as od from tab t1 group by key, value) rt1) vt1
> join
> (select rt2.id from
> (select t2.key as id, t2.value as od from tab_part t2 group by key, value) 
> rt2) vt2
> where vt1.id=vt2.id;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13240) GroupByOperator: Drop the hash aggregates when closing operator

2016-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13240:

Attachment: HIVE-13240.03.patch

The same patch, to get the logs for test failures if any, to see if they are 
related

> GroupByOperator: Drop the hash aggregates when closing operator
> ---
>
> Key: HIVE-13240
> URL: https://issues.apache.org/jira/browse/HIVE-13240
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-13240.03.patch, HIVE-13240.1.patch, 
> HIVE-13240.2.patch
>
>
> GroupByOperator holds onto the Hash aggregates accumulated when the plan is 
> cached.
> Drop the hashAggregates in case of error during forwarding to the next 
> operator.
> Added for PTF, TopN and all GroupBy cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13282) GroupBy and select operator encounter ArrayIndexOutOfBoundsException

2016-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13282:

Target Version/s: 1.2.2, 2.1.0  (was: 1.2.2, 2.1.0, 2.0.1)

> GroupBy and select operator encounter ArrayIndexOutOfBoundsException
> 
>
> Key: HIVE-13282
> URL: https://issues.apache.org/jira/browse/HIVE-13282
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>
> The group by and select operators run into the ArrayIndexOutOfBoundsException 
> when they incorrectly initialize themselves with tag 0 but the incoming tag 
> id is different.
> {code}
> select count(*) from
> (select rt1.id from
> (select t1.key as id, t1.value as od from tab t1 group by key, value) rt1) vt1
> join
> (select rt2.id from
> (select t2.key as id, t2.value as od from tab_part t2 group by key, value) 
> rt2) vt2
> where vt1.id=vt2.id;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13176) OutOfMemoryError : GC overhead limit exceeded

2016-04-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231194#comment-15231194
 ] 

Sergey Shelukhin commented on HIVE-13176:
-

This is targeting 2.0.1 but has no patch. Should it be moved out to 2.1.0?

> OutOfMemoryError :  GC overhead limit exceeded
> --
>
> Key: HIVE-13176
> URL: https://issues.apache.org/jira/browse/HIVE-13176
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Kavan Suresh
>Assignee: Siddharth Seth
> Attachments: dataNucleus.png, fs.png, shutdownhook.png
>
>
> Detected leaks while testing hiveserver2 concurrency setup with LLAP.
> 2016-02-26T12:50:58,131 ERROR [HiveServer2-Background-Pool: Thread-311030]: 
> operation.Operation (SQLOperation.java:run(230)) - Error running hive query:
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:333)
>  ~[hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:177)
>  ~[hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:73)
>  ~[hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:227)
>  [hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_45]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_45]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.1.2.3.5.1-36.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:239)
>  [hive-jdbc-2.0.0.2.3.5.1-36-standalone.jar:2.0.0.2.3.5.1-36]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_45]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13370) Add test for HIVE-11470

2016-04-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231191#comment-15231191
 ] 

Sergey Shelukhin commented on HIVE-13370:
-

Removing 2.0.1 target. Please feel free to commit to branch-2 anyway and fix 
for 2.0.1 if this happens before the release.

> Add test for HIVE-11470
> ---
>
> Key: HIVE-13370
> URL: https://issues.apache.org/jira/browse/HIVE-13370
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>Priority: Minor
> Attachments: HIVE-13370.patch
>
>
> HIVE-11470 added capability to handle NULL dynamic partitioning keys 
> properly. However, it did not add a test for the case, we should have one so 
> we don't have future regressions of the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13370) Add test for HIVE-11470

2016-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13370:

Target Version/s: 1.3.0, 1.2.2, 2.1.0  (was: 1.3.0, 1.2.2, 2.1.0, 2.0.1)

> Add test for HIVE-11470
> ---
>
> Key: HIVE-13370
> URL: https://issues.apache.org/jira/browse/HIVE-13370
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>Priority: Minor
> Attachments: HIVE-13370.patch
>
>
> HIVE-11470 added capability to handle NULL dynamic partitioning keys 
> properly. However, it did not add a test for the case, we should have one so 
> we don't have future regressions of the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13405) Fix Connection Leak in OrcRawRecordMerger

2016-04-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231189#comment-15231189
 ] 

Sergey Shelukhin commented on HIVE-13405:
-

+0.9. [~prasanth_j] does this make sense?

> Fix Connection Leak in OrcRawRecordMerger
> -
>
> Key: HIVE-13405
> URL: https://issues.apache.org/jira/browse/HIVE-13405
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-13405.patch
>
>
> In OrcRawRecordMerger.getLastFlushLength, if the opened stream throws an 
> IOException on .available() or on .readLong(), the function will exit without 
> closing the stream.
> This patch adds a try-with-resources to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13282) GroupBy and select operator encounter ArrayIndexOutOfBoundsException

2016-04-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231180#comment-15231180
 ] 

Sergey Shelukhin commented on HIVE-13282:
-

This is targeting 2.0.1 but has no patch. Should it be moved out to 2.1.0?

> GroupBy and select operator encounter ArrayIndexOutOfBoundsException
> 
>
> Key: HIVE-13282
> URL: https://issues.apache.org/jira/browse/HIVE-13282
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>
> The group by and select operators run into the ArrayIndexOutOfBoundsException 
> when they incorrectly initialize themselves with tag 0 but the incoming tag 
> id is different.
> {code}
> select count(*) from
> (select rt1.id from
> (select t1.key as id, t1.value as od from tab t1 group by key, value) rt1) vt1
> join
> (select rt2.id from
> (select t2.key as id, t2.value as od from tab_part t2 group by key, value) 
> rt2) vt2
> where vt1.id=vt2.id;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13408) Issue appending HIVE_QUERY_ID without checking if the prefix already exists

2016-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13408:

Status: Patch Available  (was: Open)

> Issue appending HIVE_QUERY_ID without checking if the prefix already exists
> ---
>
> Key: HIVE-13408
> URL: https://issues.apache.org/jira/browse/HIVE-13408
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13408.1.patch, HIVE-13408.2.patch
>
>
> {code}
> We are resetting the hadoop caller context to HIVE_QUERY_ID:HIVE_QUERY_ID:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13187) hiveserver2 can suppress OOM errors in some cases

2016-04-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231179#comment-15231179
 ] 

Sergey Shelukhin commented on HIVE-13187:
-

This is targeting 2.0.1 but has no patch. Should it be moved out to 2.1.0?

> hiveserver2 can suppress OOM errors in some cases
> -
>
> Key: HIVE-13187
> URL: https://issues.apache.org/jira/browse/HIVE-13187
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Priority: Critical
>
> Affects at least branch-2.
> See trace in https://issues.apache.org/jira/browse/HIVE-13176
> This looks to be in src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java.
> That catches Throwable in the thread and sends it further up. There's no 
> checks to see if this is an Error or general Exception - Errors end up 
> getting suppressed, instead of killing HiveServer2. This is on the processing 
> threads.
> It looks like the Handler threads have some kind of OOM checker on them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13255) FloatTreeReader.nextVector is expensive

2016-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13255:

Fix Version/s: 2.0.1
   2.1.0

> FloatTreeReader.nextVector is expensive 
> 
>
> Key: HIVE-13255
> URL: https://issues.apache.org/jira/browse/HIVE-13255
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13255.1.patch, HIVE-13255.2.patch, 
> bytecode-size-after.png, bytecode-size-before.png, float-reader-perf.png, 
> q1-bottleneck.png, q1-warm-perf-map.png
>
>
> Some TPCDS queries on 1TB scale shows FloatTreeReader on profile samples. It 
> is most likely because of multiple branching and polymorphic dispatch in 
> FloatTreeReader.nextVector() implementation. See attached image for sampling 
> profile output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13417) Some vector operators return "OP" as name

2016-04-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-13417:
--
   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

> Some vector operators return "OP" as name
> -
>
> Key: HIVE-13417
> URL: https://issues.apache.org/jira/browse/HIVE-13417
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 2.1.0
>
> Attachments: HIVE-13417.1.patch, HIVE-13417.2.patch, 
> HIVE-13417.3.patch, HIVE-13417.4.patch
>
>
> Select/Group by/Filter/etc need to return the same name whether they are the 
> regular or the vector operators. If they don't the regular path matching in 
> our optimizer code doesn't work on them.
> From the code it looks an attempt was made to follow this - unfortunately 
> getOperatorName is static and polymorphism doesn't work on these functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13417) Some vector operators return "OP" as name

2016-04-07 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231151#comment-15231151
 ] 

Gunther Hagleitner commented on HIVE-13417:
---

Committed to master.

> Some vector operators return "OP" as name
> -
>
> Key: HIVE-13417
> URL: https://issues.apache.org/jira/browse/HIVE-13417
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 2.1.0
>
> Attachments: HIVE-13417.1.patch, HIVE-13417.2.patch, 
> HIVE-13417.3.patch, HIVE-13417.4.patch
>
>
> Select/Group by/Filter/etc need to return the same name whether they are the 
> regular or the vector operators. If they don't the regular path matching in 
> our optimizer code doesn't work on them.
> From the code it looks an attempt was made to follow this - unfortunately 
> getOperatorName is static and polymorphism doesn't work on these functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13417) Some vector operators return "OP" as name

2016-04-07 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231125#comment-15231125
 ] 

Gunther Hagleitner commented on HIVE-13417:
---

Test failure is unrelated.

> Some vector operators return "OP" as name
> -
>
> Key: HIVE-13417
> URL: https://issues.apache.org/jira/browse/HIVE-13417
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-13417.1.patch, HIVE-13417.2.patch, 
> HIVE-13417.3.patch, HIVE-13417.4.patch
>
>
> Select/Group by/Filter/etc need to return the same name whether they are the 
> regular or the vector operators. If they don't the regular path matching in 
> our optimizer code doesn't work on them.
> From the code it looks an attempt was made to follow this - unfortunately 
> getOperatorName is static and polymorphism doesn't work on these functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13417) Some vector operators return "OP" as name

2016-04-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231107#comment-15231107
 ] 

Hive QA commented on HIVE-13417:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797255/HIVE-13417.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9980 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7502/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7502/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7502/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12797255 - PreCommit-HIVE-TRUNK-Build

> Some vector operators return "OP" as name
> -
>
> Key: HIVE-13417
> URL: https://issues.apache.org/jira/browse/HIVE-13417
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-13417.1.patch, HIVE-13417.2.patch, 
> HIVE-13417.3.patch, HIVE-13417.4.patch
>
>
> Select/Group by/Filter/etc need to return the same name whether they are the 
> regular or the vector operators. If they don't the regular path matching in 
> our optimizer code doesn't work on them.
> From the code it looks an attempt was made to follow this - unfortunately 
> getOperatorName is static and polymorphism doesn't work on these functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13455) JDBC: disable UT for Statement.cancel (TestJdbcDriver2#testQueryCancel)

2016-04-07 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231099#comment-15231099
 ] 

Vaibhav Gumashta commented on HIVE-13455:
-

The test calls fail() in a spawned thread and junit doesn't fail the test due 
to that. Will need to address that as well before re-enabling.

> JDBC: disable UT for Statement.cancel (TestJdbcDriver2#testQueryCancel)
> ---
>
> Key: HIVE-13455
> URL: https://issues.apache.org/jira/browse/HIVE-13455
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-13455.1.patch
>
>
> JDBC Statement.cancel doesn't seem to work. The related UT is also flaky as a 
> result. We should disable it till we fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13420) Clarify HS2 WebUI Query 'Elapsed TIme'

2016-04-07 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-13420:
-
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-12338

> Clarify HS2 WebUI Query 'Elapsed TIme'
> --
>
> Key: HIVE-13420
> URL: https://issues.apache.org/jira/browse/HIVE-13420
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Affects Versions: 2.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: Elapsed Time.png, HIVE-13420.2.patch, HIVE-13420.patch, 
> Patched UI.2.png, Patched UI.png
>
>
> Today the "Queries" section of the WebUI shows SQLOperations that are not 
> closed.
> Elapsed time is thus a bit confusing, people might take this to mean query 
> runtime, actually it is the time since the operation was opened.  The query 
> may be finished, but operation is not closed.  Perhaps another timer column 
> is needed showing the runtime of the query to reduce this confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13455) JDBC: disable UT for Statement.cancel (TestJdbcDriver2#testQueryCancel)

2016-04-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13455:

Attachment: HIVE-13455.1.patch

> JDBC: disable UT for Statement.cancel (TestJdbcDriver2#testQueryCancel)
> ---
>
> Key: HIVE-13455
> URL: https://issues.apache.org/jira/browse/HIVE-13455
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-13455.1.patch
>
>
> JDBC Statement.cancel doesn't seem to work. The related UT is also flaky as a 
> result. We should disable it till we fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13455) JDBC: disable UT for Statement.cancel (TestJdbcDriver2#testQueryCancel)

2016-04-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13455:

Summary: JDBC: disable UT for Statement.cancel 
(TestJdbcDriver2#testQueryCancel)  (was: JDBC: disable UT for Statement.cancel)

> JDBC: disable UT for Statement.cancel (TestJdbcDriver2#testQueryCancel)
> ---
>
> Key: HIVE-13455
> URL: https://issues.apache.org/jira/browse/HIVE-13455
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> JDBC Statement.cancel doesn't seem to work. The related UT is also flaky as a 
> result. We should disable it till we fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-04-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231060#comment-15231060
 ] 

Pengcheng Xiong commented on HIVE-13452:


mysql
{code}
Database changed
mysql> create table t1 (a int);
Query OK, 0 rows affected (0.02 sec)

mysql> select count(1) from t1 group by 1;
ERROR 1056 (42000): Can't group on 'count(1)'
mysql> select count(1) from t1;
+--+
| count(1) |
+--+
|0 |
+--+
1 row in set (0.00 sec)
{code}

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12741) HS2 ShutdownHookManager holds extra of Driver instance in master/branch-2.0

2016-04-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-12741:
--
Fix Version/s: 1.3.0

> HS2 ShutdownHookManager holds extra of Driver instance in master/branch-2.0
> ---
>
> Key: HIVE-12741
> URL: https://issues.apache.org/jira/browse/HIVE-12741
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12741.1.patch
>
>
> HIVE-12187 was meant to fix the described memory leak, however because of 
> interaction with HIVE-12187 in branch-2.0/master, the fix fails to take 
> effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13456) JDBC: fix Statement.cancel

2016-04-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13456:

Description: JDBC Statement.cancel is supposed to work by cancelling the 
underlying execution and freeing resources. However, in my testing, I see it 
failing in some runs for the same query.

> JDBC: fix Statement.cancel
> --
>
> Key: HIVE-13456
> URL: https://issues.apache.org/jira/browse/HIVE-13456
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> JDBC Statement.cancel is supposed to work by cancelling the underlying 
> execution and freeing resources. However, in my testing, I see it failing in 
> some runs for the same query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13452) StatsOptimizer should return no rows on empty table with group by

2016-04-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231065#comment-15231065
 ] 

Pengcheng Xiong commented on HIVE-13452:


postgres
{code}
dbtmp=# create table t1 (a int);
CREATE TABLE
dbtmp=# select count(1) from t1 group by 1;
ERROR:  aggregates not allowed in GROUP BY clause
LINE 1: select count(1) from t1 group by 1;
   ^
dbtmp=# select count(1) from t1;
 count
---
 0
(1 row)
{code}

> StatsOptimizer should return no rows on empty table with group by
> -
>
> Key: HIVE-13452
> URL: https://issues.apache.org/jira/browse/HIVE-13452
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Pengcheng Xiong
>
> {code}
> create table t1 (a int);
> analyze table t1 compute statistics;
> analyze table t1 compute statistics for columns;
> select count(1) from t1 group by 1;
> set hive.compute.query.using.stats=true;
> select count(1) from t1 group by 1;
> {code}
> In both cases result set should be empty. However, with statsoptimizer on 
> Hive returns one row with value 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12741) HS2 ShutdownHookManager holds extra of Driver instance in master/branch-2.0

2016-04-07 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231059#comment-15231059
 ] 

Daniel Dai commented on HIVE-12741:
---

Also pushed to branch-1.

> HS2 ShutdownHookManager holds extra of Driver instance in master/branch-2.0
> ---
>
> Key: HIVE-12741
> URL: https://issues.apache.org/jira/browse/HIVE-12741
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12741.1.patch
>
>
> HIVE-12187 was meant to fix the described memory leak, however because of 
> interaction with HIVE-12187 in branch-2.0/master, the fix fails to take 
> effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13439) JDBC: provide a way to retrieve GUID to query Yarn ATS

2016-04-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13439:

Attachment: HIVE-13439.2.patch

> JDBC: provide a way to retrieve GUID to query Yarn ATS
> --
>
> Key: HIVE-13439
> URL: https://issues.apache.org/jira/browse/HIVE-13439
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-13439.1.patch, HIVE-13439.2.patch
>
>
> HIVE-9673 added support for passing base64 encoded operation handles to ATS. 
> We should a method on client side to retrieve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12741) HS2 ShutdownHookManager holds extra of Driver instance in master/branch-2.0

2016-04-07 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231046#comment-15231046
 ] 

Daniel Dai commented on HIVE-12741:
---

This affects branch-1 as well. The reason for this leak is Driver.compile is 
nested, and we only invoke destroy once for this case:
{code}
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
at 
org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:223)
at 
org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:151)
at 
org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.getIndexBuilderMapRedTask(TableBasedIndexHandler.java:108)
at 
org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:92)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1228)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1175)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:408)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:464)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1188)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:110)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:181)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:419)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:406)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy20.executeStatementAsync(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:276)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> HS2 ShutdownHookManager holds extra of Driver instance in master/branch-2.0
> ---
>
> Key: HIVE-12741
> URL: https://issues.apache.org/jira/browse/HIVE-12741
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 2.0.0
>
> Attachments: HIVE-12741.1.patch
>
>
> HIVE-12187 was meant to fix the described memory leak, however because of 
> interaction with HIVE-12187 in branch-2.0/master, the fix fails to take 
> effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13360) Refactoring Hive Authorization

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13360:
---
Affects Version/s: 2.0.0

> Refactoring Hive Authorization
> --
>
> Key: HIVE-13360
> URL: https://issues.apache.org/jira/browse/HIVE-13360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13360.01.patch, HIVE-13360.02.patch, 
> HIVE-13360.03.patch, HIVE-13360.04.patch, HIVE-13360.final.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13360) Refactoring Hive Authorization

2016-04-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231022#comment-15231022
 ] 

Pengcheng Xiong commented on HIVE-13360:


manually rerun all the test cases that failed. can not repo any of them. pushed 
to master. Thanks [~ashutoshc] for the review.

> Refactoring Hive Authorization
> --
>
> Key: HIVE-13360
> URL: https://issues.apache.org/jira/browse/HIVE-13360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13360.01.patch, HIVE-13360.02.patch, 
> HIVE-13360.03.patch, HIVE-13360.04.patch, HIVE-13360.final.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13360) Refactoring Hive Authorization

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13360:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Refactoring Hive Authorization
> --
>
> Key: HIVE-13360
> URL: https://issues.apache.org/jira/browse/HIVE-13360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13360.01.patch, HIVE-13360.02.patch, 
> HIVE-13360.03.patch, HIVE-13360.04.patch, HIVE-13360.final.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13360) Refactoring Hive Authorization

2016-04-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13360:
---
Attachment: HIVE-13360.final.patch

> Refactoring Hive Authorization
> --
>
> Key: HIVE-13360
> URL: https://issues.apache.org/jira/browse/HIVE-13360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13360.01.patch, HIVE-13360.02.patch, 
> HIVE-13360.03.patch, HIVE-13360.04.patch, HIVE-13360.final.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12968) genNotNullFilterForJoinSourcePlan: needs to merge predicates into the multi-AND

2016-04-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-12968:
---

Assignee: Ashutosh Chauhan  (was: Gopal V)

> genNotNullFilterForJoinSourcePlan: needs to merge predicates into the 
> multi-AND
> ---
>
> Key: HIVE-12968
> URL: https://issues.apache.org/jira/browse/HIVE-12968
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-12968.1.patch, HIVE-12968.2.patch, 
> HIVE-12968.3.patch, HIVE-12968.4.patch, HIVE-12968.5.patch, 
> HIVE-12968.6.patch, HIVE-12968.7.patch
>
>
> {code}
> predicate: ((cbigint is not null and cint is not null) and cint BETWEEN 
> 100 AND 300) (type: boolean)
> {code}
> does not fold the IS_NULL on cint, because of the structure of the AND clause.
> For example, see {{tez_dynpart_hashjoin_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12959) LLAP: Add task scheduler timeout when no nodes are alive

2016-04-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231010#comment-15231010
 ] 

Prasanth Jayachandran commented on HIVE-12959:
--

[~sseth] Could you please review? This patch needs tez-0.8.3-SNAPSHOT for 
compilation. 

> LLAP: Add task scheduler timeout when no nodes are alive
> 
>
> Key: HIVE-12959
> URL: https://issues.apache.org/jira/browse/HIVE-12959
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12959.1.patch, HIVE-12959.2.patch
>
>
> When there are no llap daemons running task scheduler should have a timeout 
> to fail the query instead of waiting forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13455) JDBC: disable UT for Statement.cancel

2016-04-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13455:

Component/s: JDBC
 HiveServer2

> JDBC: disable UT for Statement.cancel
> -
>
> Key: HIVE-13455
> URL: https://issues.apache.org/jira/browse/HIVE-13455
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> JDBC Statement.cancel doesn't seem to work. The related UT is also flaky as a 
> result. We should disable it till we fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12959) LLAP: Add task scheduler timeout when no nodes are alive

2016-04-07 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12959:
-
Attachment: HIVE-12959.2.patch

> LLAP: Add task scheduler timeout when no nodes are alive
> 
>
> Key: HIVE-12959
> URL: https://issues.apache.org/jira/browse/HIVE-12959
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12959.1.patch, HIVE-12959.2.patch
>
>
> When there are no llap daemons running task scheduler should have a timeout 
> to fail the query instead of waiting forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13455) JDBC: disable UT for Statement.cancel

2016-04-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13455:

Affects Version/s: 1.2.1
   2.0.0

> JDBC: disable UT for Statement.cancel
> -
>
> Key: HIVE-13455
> URL: https://issues.apache.org/jira/browse/HIVE-13455
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> JDBC Statement.cancel doesn't seem to work. The related UT is also flaky as a 
> result. We should disable it till we fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13413) add a llapstatus command line tool

2016-04-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-13413:
--
Attachment: HIVE-13413.03.patch

Updated patch with review comments addressed. Thanks for the reviews.

> add a llapstatus command line tool
> --
>
> Key: HIVE-13413
> URL: https://issues.apache.org/jira/browse/HIVE-13413
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13413.01.patch, HIVE-13413.02.patch, 
> HIVE-13413.03.patch, appComplete, invalidApp, oneContainerDown, running, 
> starting
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13450) LLAP: wire encryption for HDFS

2016-04-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230959#comment-15230959
 ] 

Sergey Shelukhin commented on HIVE-13450:
-

I thought we weren't sure if it would work in LLAP... are we? In that case a 
no-op I guess

> LLAP: wire encryption for HDFS
> --
>
> Key: HIVE-13450
> URL: https://issues.apache.org/jira/browse/HIVE-13450
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13438) Add a service check script for llap

2016-04-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230955#comment-15230955
 ] 

Sergey Shelukhin commented on HIVE-13438:
-

I meant executing it as a separate command in llapservicedriver, so that the 
user could do it. I guess it's not necessary for some cases, should be ok

> Add a service check script for llap
> ---
>
> Key: HIVE-13438
> URL: https://issues.apache.org/jira/browse/HIVE-13438
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13438.1.patch, HIVE-13438.2.patch
>
>
> We want to have a test script that can be run by an installer such as ambari 
> that makes sure that the service is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13342) Improve logging in llap decider for llap

2016-04-07 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230950#comment-15230950
 ] 

Vikram Dixit K commented on HIVE-13342:
---

[~sseth] Yes. The other log lines should tell us which operator interferes with 
running in llap. I changed the exception to use the right configuration 
variable from HiveConf. However, there is currently no way to get the values a 
configuration can take from code.

I think it is better to not add more configuration to enable/disable the mode = 
all behavior. If the user is not sure if they can run in llap, they need to use 
mode = auto. The mode = all behavior only prevents further checking on the 
query if it can be run in llap. If under mode all, query cannot be run in llap 
because some parts of the plan cannot be run in it, it makes sense to stop the 
user from proceeding. If you feel strongly about needing the flag, I can add 
one but I am not convinced at this point in time.

> Improve logging in llap decider for llap
> 
>
> Key: HIVE-13342
> URL: https://issues.apache.org/jira/browse/HIVE-13342
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13342.1.patch, HIVE-13342.2.patch
>
>
> Currently we do not log our decisions with respect to llap. Are we running 
> everything in llap mode or only parts of the plan. We need more logging. 
> Also, if llap mode is all but for some reason, we cannot run the work in llap 
> mode, fail and throw an exception advise the user to change the mode to auto.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13287) Add logic to estimate stats for IN operator

2016-04-07 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230934#comment-15230934
 ] 

Jesus Camacho Rodriguez commented on HIVE-13287:


Some new q files might need to be updated still.

> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.01.patch, HIVE-13287.02.patch, 
> HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13287) Add logic to estimate stats for IN operator

2016-04-07 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230932#comment-15230932
 ] 

Jesus Camacho Rodriguez commented on HIVE-13287:


I have uploaded a new patch; to keep it short, original patch had the problem 
that was taking original number of columns as zero for some cases (from 
evaluatedRowCount). New patch solves that issue.

> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.01.patch, HIVE-13287.02.patch, 
> HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13320) Apply HIVE-11544 to explicit conversions as well as implicit ones

2016-04-07 Thread Nita Dembla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nita Dembla updated HIVE-13320:
---
Attachment: HIVE-13320.2.patch

> Apply HIVE-11544 to explicit conversions as well as implicit ones
> -
>
> Key: HIVE-13320
> URL: https://issues.apache.org/jira/browse/HIVE-13320
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.3.0, 1.2.1, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Nita Dembla
> Attachments: HIVE-13320.1.patch, HIVE-13320.2.patch, 
> HIVE-13320.2.patch
>
>
> Parsing 1 million blank values through cast(x as int) is 3x slower than 
> parsing a valid single digit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-13287) Add logic to estimate stats for IN operator

2016-04-07 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13287 started by Jesus Camacho Rodriguez.
--
> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.01.patch, HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator

2016-04-07 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13287:
---
Status: Patch Available  (was: In Progress)

> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.01.patch, HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator

2016-04-07 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13287:
---
Status: Open  (was: Patch Available)

> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.01.patch, HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13413) add a llapstatus command line tool

2016-04-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230841#comment-15230841
 ] 

Prasanth Jayachandran commented on HIVE-13413:
--

LGTM, +1. Left minor comments in RB.

> add a llapstatus command line tool
> --
>
> Key: HIVE-13413
> URL: https://issues.apache.org/jira/browse/HIVE-13413
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13413.01.patch, HIVE-13413.02.patch, appComplete, 
> invalidApp, oneContainerDown, running, starting
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13413) add a llapstatus command line tool

2016-04-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-13413:
--
Attachment: HIVE-13413.02.patch

Thanks for the review. Updated patch attached.

bq. Create a follow-up? I guess at this point this tool is just used as health 
check/status of daemons. Per daemon configurations are obtained via JMX?
Done

bq. daemon webaddress/status page currently shows Error 404. Is that part of 
this jira or another?
This is being added in HIVE-13398

bq. populateAppStatusFromLlapRegistry(). do we need to create new Configuration 
object? reuse already created object?
Creating a new instance, since we're modifying a field to set the instance 
name. Don't want to modify the original configuration used by the class.

bq. llapExtraInstances.add(llapInstance); This line add nulls to the list 
right? I don't see it used anywhere other than logging. use boolean instead?
This was not supposed to be adding LlapInstances. Changed to add containerId. 
While that's not used - it could be useful for logging in the future.

bq. nit: remove deadcode. // String nmUrl = (String) 
containerParams.get("hostUrl");
Done

bq. wow. Map>>
That was painful to deal with :(

> add a llapstatus command line tool
> --
>
> Key: HIVE-13413
> URL: https://issues.apache.org/jira/browse/HIVE-13413
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13413.01.patch, HIVE-13413.02.patch, appComplete, 
> invalidApp, oneContainerDown, running, starting
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6090) Audit logs for HiveServer2

2016-04-07 Thread HeeSoo Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230735#comment-15230735
 ] 

HeeSoo Kim commented on HIVE-6090:
--

[~thiruvel] Would you check the error of your last patch?
Does anyone review this ticket?

> Audit logs for HiveServer2
> --
>
> Key: HIVE-6090
> URL: https://issues.apache.org/jira/browse/HIVE-6090
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability, HiveServer2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>  Labels: audit, hiveserver
> Attachments: HIVE-6090.1.WIP.patch, HIVE-6090.1.patch, 
> HIVE-6090.3.patch, HIVE-6090.4.patch, HIVE-6090.patch
>
>
> HiveMetastore has audit logs and would like to audit all queries or requests 
> to HiveServer2 also. This will help in understanding how the APIs were used, 
> queries submitted, users etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-07 Thread Rohit Dholakia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-12049:
--
Attachment: HIVE-12049.18.patch

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, 
> HIVE-12049.18.patch, HIVE-12049.2.patch, HIVE-12049.3.patch, 
> HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, 
> HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, 
> old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13451) LLAP: wire encryption for shuffle

2016-04-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-13451.
---
Resolution: Duplicate
  Assignee: (was: Siddharth Seth)

> LLAP: wire encryption for shuffle
> -
>
> Key: HIVE-13451
> URL: https://issues.apache.org/jira/browse/HIVE-13451
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13438) Add a service check script for llap

2016-04-07 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230648#comment-15230648
 ] 

Vikram Dixit K commented on HIVE-13438:
---

Fixed the error.

[~sershe] Can you elaborate a little more? Are you suggesting that we could run 
the query as part of starting the service itself? We could add that too but we 
still need something to run end-to-end (from starting the shell onwards).

> Add a service check script for llap
> ---
>
> Key: HIVE-13438
> URL: https://issues.apache.org/jira/browse/HIVE-13438
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13438.1.patch, HIVE-13438.2.patch
>
>
> We want to have a test script that can be run by an installer such as ambari 
> that makes sure that the service is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13450) LLAP: wire encryption for HDFS

2016-04-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230642#comment-15230642
 ] 

Siddharth Seth commented on HIVE-13450:
---

[~sershe] - I believe this will be handled by HDFS configuration, and certain 
paths setup to use encryption. Is there anything specific to be done ?

> LLAP: wire encryption for HDFS
> --
>
> Key: HIVE-13450
> URL: https://issues.apache.org/jira/browse/HIVE-13450
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13446) LLAP: set default management protocol acls to deny all

2016-04-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230638#comment-15230638
 ] 

Siddharth Seth commented on HIVE-13446:
---

We could also ensure that the user connecting is the same user that the process 
is running as. Only HiveServer should have access to the management protocol at 
the moment.

> LLAP: set default management protocol acls to deny all
> --
>
> Key: HIVE-13446
> URL: https://issues.apache.org/jira/browse/HIVE-13446
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> The user needs to set the acls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13438) Add a service check script for llap

2016-04-07 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13438:
--
Attachment: HIVE-13438.2.patch

> Add a service check script for llap
> ---
>
> Key: HIVE-13438
> URL: https://issues.apache.org/jira/browse/HIVE-13438
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13438.1.patch, HIVE-13438.2.patch
>
>
> We want to have a test script that can be run by an installer such as ambari 
> that makes sure that the service is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13415) Decouple Sessions from thrift binary transport

2016-04-07 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230621#comment-15230621
 ] 

Szehon Ho commented on HIVE-13415:
--

This just seems to remove it without making it configurable?

> Decouple Sessions from thrift binary transport
> --
>
> Key: HIVE-13415
> URL: https://issues.apache.org/jira/browse/HIVE-13415
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13415.01.patch
>
>
> Current behaviour is:
> * Open a thrift binary transport
> * create a session
> * close the transport
> Then the session gets closed. Consequently, all the operations running in the 
> session also get killed.
> Whereas, if you open an HTTP transport, and close, the enclosing sessions are 
> not closed. 
> This seems like a bad design, having transport and sessions tightly coupled. 
> I'd like to fix this. 
> The issue that introduced it is 
> [HIVE-9601|https://github.com/apache/hive/commit/48bea00c48853459af64b4ca9bfdc3e821c4ed82]
>  Relevant discussions at 
> [here|https://issues.apache.org/jira/browse/HIVE-11485?focusedCommentId=15223546=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15223546],
>  
> [here|https://issues.apache.org/jira/browse/HIVE-11485?focusedCommentId=15223827=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15223827]
>  and mentioned links on those comments. 
> Another thing that seems like a slightly bad design is this line of code in 
> ThriftBinaryCLIService:
> {noformat}
> server.setServerEventHandler(serverEventHandler);
> {noformat}
> Whereas serverEventHandler is defined by the base class, with no users except 
> one sub-class(ThriftBinaryCLIService), violating the separation of concerns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13420) Clarify HS2 WebUI Query 'Elapsed TIme'

2016-04-07 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-13420:
-
Status: Patch Available  (was: Open)

> Clarify HS2 WebUI Query 'Elapsed TIme'
> --
>
> Key: HIVE-13420
> URL: https://issues.apache.org/jira/browse/HIVE-13420
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Affects Versions: 2.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: Elapsed Time.png, HIVE-13420.2.patch, HIVE-13420.patch, 
> Patched UI.2.png, Patched UI.png
>
>
> Today the "Queries" section of the WebUI shows SQLOperations that are not 
> closed.
> Elapsed time is thus a bit confusing, people might take this to mean query 
> runtime, actually it is the time since the operation was opened.  The query 
> may be finished, but operation is not closed.  Perhaps another timer column 
> is needed showing the runtime of the query to reduce this confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2016-04-07 Thread Qiuzhuang Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230452#comment-15230452
 ] 

Qiuzhuang Lian commented on HIVE-11981:
---

For more info, when I compact the table in hive 1.2 version, we see 
TreeReaderFactory errors as reported in this issue:

https://issues.apache.org/jira/browse/HIVE-13432

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13420) Clarify HS2 WebUI Query 'Elapsed TIme'

2016-04-07 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230328#comment-15230328
 ] 

Aihua Xu commented on HIVE-13420:
-

The patch looks good. +1.

> Clarify HS2 WebUI Query 'Elapsed TIme'
> --
>
> Key: HIVE-13420
> URL: https://issues.apache.org/jira/browse/HIVE-13420
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Affects Versions: 2.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: Elapsed Time.png, HIVE-13420.2.patch, HIVE-13420.patch, 
> Patched UI.2.png, Patched UI.png
>
>
> Today the "Queries" section of the WebUI shows SQLOperations that are not 
> closed.
> Elapsed time is thus a bit confusing, people might take this to mean query 
> runtime, actually it is the time since the operation was opened.  The query 
> may be finished, but operation is not closed.  Perhaps another timer column 
> is needed showing the runtime of the query to reduce this confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13427) Update committer list

2016-04-07 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved HIVE-13427.
-
Resolution: Fixed

Committed. Thanks Szehon.

> Update committer list
> -
>
> Key: HIVE-13427
> URL: https://issues.apache.org/jira/browse/HIVE-13427
> Project: Hive
>  Issue Type: Bug
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-13427.patch
>
>
> Please update committer list:
> Name: Aihua Xu
> Apache ID: aihuaxu
> Organization: Cloudera
> Name: Yongzhi Chen
> Apache ID: ychena
> Organization: Cloudera



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on

2016-04-07 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230290#comment-15230290
 ] 

Aihua Xu commented on HIVE-13235:
-

[~ashutoshc] I haven't had a final solution yet. Seems my solutions would fix 
the issue but also break valid constant propagation. I think it's on the right 
direction: for select operators, an alias and internal name are not enough. We 
should have another columnName if it's mapped to table column (e.g., select 
col1 as alias). The parent ops would only see col1 but child ops would only see 
alias. Right now, we ignore col1 but use alias always.

I'm working on it but seems to need bigger changes. Will create RB when it's 
ready.  

> Insert from select generates incorrect result when 
> hive.optimize.constant.propagation is on
> ---
>
> Key: HIVE-13235
> URL: https://issues.apache.org/jira/browse/HIVE-13235
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, 
> HIVE-13235.3.patch
>
>
> The following query returns incorrect result when constant optimization is 
> turned on. The subquery happens to have an alias p1 to be the same as the 
> input partition name. Constant optimizer will optimize it incorrectly as the 
> constant.
> When constant optimizer is turned off, we will get the correct result.
> {noformat}
> set hive.cbo.enable=false;
> set hive.optimize.constant.propagation = true;
> create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string);
> create table t2(p1 double, c2 string);
> insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0);
> INSERT OVERWRITE TABLE t2  select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from 
> t1 where c1 = 'c1' and p1 = '40';
> select * from t2;
> 40   2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11351) Column Found in more than One Tables/Subqueries

2016-04-07 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230278#comment-15230278
 ] 

Aihua Xu commented on HIVE-11351:
-

This does look the same issue as HIVE-13235. This fix would work I guess, but 
it will break many valid constant propagation. I'm still working on it.

> Column Found in more than One Tables/Subqueries
> ---
>
> Key: HIVE-11351
> URL: https://issues.apache.org/jira/browse/HIVE-11351
> Project: Hive
>  Issue Type: Bug
> Environment: HIVE 1.1.0
>Reporter: MK
>Assignee: Alina Abramova
> Attachments: HIVE-11351-branch-1.0.patch
>
>
> when execute a script:
> INSERT overwrite TABLE tmp.tmp_dim_cpttr_categ1
>SELECT DISTINCT cur.categ_id   AS categ_id,
>cur.categ_code AS categ_code,
>cur.categ_name AS categ_name,
>cur.categ_parnt_id AS categ_parnt_id,
>par.categ_name AS categ_parnt_name,
>cur.mc_site_id AS mc_site_id
>FROM   tmp.tmp_dim_cpttr_categ cur
>LEFT   OUTER JOIN tmp.tmp_dim_cpttr_categ par
>ON cur.categ_parnt_id = par.categ_id;
> error occur :  SemanticException Column categ_name Found in more than One 
> Tables/Subqueries
> when modify the alias categ_name to categ_name_cur, it will be execute 
> successfully.
> INSERT overwrite TABLE tmp.tmp_dim_cpttr_categ1
>SELECT DISTINCT cur.categ_id   AS categ_id,
>cur.categ_code AS categ_code,
>cur.categ_name AS categ_name_cur,
>cur.categ_parnt_id AS categ_parnt_id,
>par.categ_name AS categ_parnt_name,
>cur.mc_site_id AS mc_site_id
>FROM   tmp.tmp_dim_cpttr_categ cur
>LEFT   OUTER JOIN tmp.tmp_dim_cpttr_categ par
>ON cur.categ_parnt_id = par.categ_id;
>  this happen when we upgrade hive from 0.10 to 1.1.0 .  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

2016-04-07 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230269#comment-15230269
 ] 

Aihua Xu commented on HIVE-9534:


[~leftylev] I just updated the doc. 

I also filed HIVE-13453 to track the improvement to address the current 
limitation.

> incorrect result set for query that projects a windowed aggregate
> -
>
> Key: HIVE-9534
> URL: https://issues.apache.org/jira/browse/HIVE-9534
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: N Campbell
>Assignee: Aihua Xu
> Fix For: 2.1.0
>
> Attachments: HIVE-9534.1.patch, HIVE-9534.2.patch, HIVE-9534.3.patch, 
> HIVE-9534.4.patch
>
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint 
> create table  if not exists TSINT (RNUM int , CSINT smallint)
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13453) Support ORDER BY and windowing clause in partitioning clause with distinct function

2016-04-07 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-13453:
---

Assignee: Aihua Xu  (was: Harish Butani)

> Support ORDER BY and windowing clause in partitioning clause with distinct 
> function
> ---
>
> Key: HIVE-13453
> URL: https://issues.apache.org/jira/browse/HIVE-13453
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> Current distinct function on partitioning doesn't support order by and 
> windowing clause due to performance reason. Explore an efficient way to 
> support that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13453) Support ORDER BY and windowing clause in partitioning clause with distinct function

2016-04-07 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13453:

Fix Version/s: (was: 2.1.0)

> Support ORDER BY and windowing clause in partitioning clause with distinct 
> function
> ---
>
> Key: HIVE-13453
> URL: https://issues.apache.org/jira/browse/HIVE-13453
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> Current distinct function on partitioning doesn't support order by and 
> windowing clause due to performance reason. Explore an efficient way to 
> support that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13400) Following up HIVE-12481, add retry for Zookeeper service discovery

2016-04-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230099#comment-15230099
 ] 

Hive QA commented on HIVE-13400:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797180/HIVE-13400.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9964 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vector_distinct_2.q-load_dyn_part2.q-join1.q-and-12-more - 
did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7497/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7497/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7497/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12797180 - PreCommit-HIVE-TRUNK-Build

> Following up HIVE-12481, add retry for Zookeeper service discovery
> --
>
> Key: HIVE-13400
> URL: https://issues.apache.org/jira/browse/HIVE-13400
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13400.1.patch, HIVE-13400.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12077) MSCK Repair table should fix partitions in batches

2016-04-07 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-12077:

Status: Patch Available  (was: Open)

Batch size can be configured for the msck repair command with the newly 
introduced propery "hive.msck.repair.batch.size".
If the value is greater than zero, it will execute batchwise with the 
configured batch size. 
Default value for the property is zero. Zero means it will execute directly Not 
batchwise.

> MSCK Repair table should fix partitions in batches 
> ---
>
> Key: HIVE-12077
> URL: https://issues.apache.org/jira/browse/HIVE-12077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ryan P
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch, 
> HIVE-12077.3.patch
>
>
> If a user attempts to run MSCK REPAIR TABLE on a directory with a large 
> number of untracked partitions HMS will OOME. I suspect this is because it 
> attempts to do one large bulk load in an effort to save time. Ultimately this 
> can lead to a collection so large in size that HMS eventually hits an Out of 
> Memory Exception. 
> Instead I suggest that Hive include a configurable batch size that HMS can 
> use to break up the load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12077) MSCK Repair table should fix partitions in batches

2016-04-07 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-12077:

Attachment: HIVE-12077.3.patch

> MSCK Repair table should fix partitions in batches 
> ---
>
> Key: HIVE-12077
> URL: https://issues.apache.org/jira/browse/HIVE-12077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ryan P
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch, 
> HIVE-12077.3.patch
>
>
> If a user attempts to run MSCK REPAIR TABLE on a directory with a large 
> number of untracked partitions HMS will OOME. I suspect this is because it 
> attempts to do one large bulk load in an effort to save time. Ultimately this 
> can lead to a collection so large in size that HMS eventually hits an Out of 
> Memory Exception. 
> Instead I suggest that Hive include a configurable batch size that HMS can 
> use to break up the load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12077) MSCK Repair table should fix partitions in batches

2016-04-07 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-12077:

Attachment: HIVE-12077.2.patch

> MSCK Repair table should fix partitions in batches 
> ---
>
> Key: HIVE-12077
> URL: https://issues.apache.org/jira/browse/HIVE-12077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ryan P
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch
>
>
> If a user attempts to run MSCK REPAIR TABLE on a directory with a large 
> number of untracked partitions HMS will OOME. I suspect this is because it 
> attempts to do one large bulk load in an effort to save time. Ultimately this 
> can lead to a collection so large in size that HMS eventually hits an Out of 
> Memory Exception. 
> Instead I suggest that Hive include a configurable batch size that HMS can 
> use to break up the load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2016-04-07 Thread Qiuzhuang Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229977#comment-15229977
 ] 

Qiuzhuang Lian commented on HIVE-11981:
---

nope, our table doesn't have any struct. here is the DDL,

CREATE TABLE `my_orc_table`(
  `id` string, 
  `store_no_from` string, 
  `store_name_from` string, 
  `store_no_to` string, 
  `store_name_to` string, 
  `order_unit_no_from` string, 
  `order_unit_name_from` string, 
  `order_unit_no_to` string, 
  `order_unit_name_to` string, 
  `store_no` string, 
  `store_name` string, 
  `company_no` string, 
  `order_unit_no` string, 
  `order_unit_name` string, 
  `item_no` string, 
  `item_code` string, 
  `item_name` string, 
  `brand_no` string, 
  `brand_name` string, 
  `category_no` string, 
  `sku_no` string, 
  `size_no` string, 
  `size_kind` string, 
  `bill_no` string, 
  `status` tinyint, 
  `bill_type` int, 
  `in_out_flag` tinyint, 
  `ref_bill_no` string, 
  `ref_bill_type` int, 
  `biz_type` int, 
  `account_type` tinyint, 
  `bill_date` date, 
  `cost` decimal(12,2), 
  `balance_offset` int, 
  `balance_qty` int, 
  `factory_in_offset` int, 
  `factory_in_qty` int, 
  `factory_in_diff_offset` int, 
  `factory_in_diff_qty` int, 
  `transit_in_offset` int, 
  `transit_in_qty` int, 
  `transit_out_offset` int, 
  `transit_out_qty` int, 
  `in_diff_offset` int, 
  `in_diff_qty` int, 
  `out_diff_offset` int, 
  `out_diff_qty` int, 
  `transit_in_account_offset` int, 
  `transit_in_account_qty` int, 
  `transit_out_account_offset` int, 
  `transit_out_account_qty` int, 
  `in_diff_account_offset` int, 
  `in_diff_account_qty` int, 
  `out_diff_account_offset` int, 
  `out_diff_account_qty` int, 
  `lock_offset` int, 
  `lock_qty` int, 
  `occupied_offset` int, 
  `occupied_qty` int, 
  `backup_offset` int, 
  `backup_qty` int, 
  `guest_bad_offset` int, 
  `guest_bad_qty` int, 
  `original_bad_offset` int, 
  `original_bad_qty` int, 
  `bad_transit_offset` int, 
  `bad_transit_qty` int, 
  `bad_diff_offset` int, 
  `bad_diff_qty` int, 
  `return_offset` int, 
  `return_qty` int, 
  `borrow_offset` int, 
  `borrow_qty` int, 
  `create_time` timestamp, 
  `create_timestamp` timestamp, 
  `update_time` timestamp, 
  `sharding_flag` string, 
  `yw_update_time` timestamp, 
  `hive_create_time` timestamp, 
  `biz_date` int)
CLUSTERED BY ( 
  id) 
INTO 10 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nn:9000/hive/warehouse/lqz.db/my_orc_table'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
  'last_modified_by'='hive', 
  'last_modified_time'='1460015324', 
  'numFiles'='23', 
  'numRows'='33828471', 
  'orc.compress'='SNAPPY', 
  'orc.create.index'='true', 
  'orc.stripe.size'='67108864', 
  'rawDataSize'='92332902940', 
  'totalSize'='1474582939', 
  'transactional'='true', 
  'transient_lastDdlTime'='1460015745')

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2016-04-07 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229965#comment-15229965
 ] 

Matt McCline commented on HIVE-11981:
-

[~qiuzhuang] What is the DDL for the table?  Does it have a column of type 
STRUCT?

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2016-04-07 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12878:

Status: Patch Available  (was: In Progress)

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch, HIVE-12878.07.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2016-04-07 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12878:

Attachment: HIVE-12878.07.patch

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch, HIVE-12878.07.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2016-04-07 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12878:

Status: In Progress  (was: Patch Available)

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >