[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table

2015-12-16 Thread Sivanesan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061630#comment-15061630
 ] 

Sivanesan commented on HIVE-5795:
-

But it was the other way around. It was able to skip if the file size is lesser 
than block size and skips random detail record if the file size is more than 
block size.

My assumption: While using CombineHiveInputFormat, a record around end of first 
block is skipped.

> Hive should be able to skip header and footer rows when reading data file for 
> a table
> -
>
> Key: HIVE-5795
> URL: https://issues.apache.org/jira/browse/HIVE-5795
> Project: Hive
>  Issue Type: New Feature
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
>  Labels: TODOC13
> Fix For: 0.13.0
>
> Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, 
> HIVE-5795.4.patch, HIVE-5795.5.patch
>
>
> Hive should be able to skip header and footer lines when reading data file 
> from table. In this way, user don't need to processing data which generated 
> by other application with a header or footer and directly use the file for 
> table operations.
> To implement this, the idea is adding new properties in table descriptions to 
> define the number of lines in header and footer and skip them when reading 
> the record from record reader. An DDL example for creating a table with 
> header and footer should be like this:
> {code}
> Create external table testtable (name string, message string) row format 
> delimited fields terminated by '\t' lines terminated by '\n' location 
> '/testtable' tblproperties ("skip.header.line.count"="1", 
> "skip.footer.line.count"="2");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061617#comment-15061617
 ] 

Hive QA commented on HIVE-11355:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12778139/HIVE-11355.7.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 184 failed/errored test(s), 9949 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_constprog_dpp
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llapdecider
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_mrr
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_tests
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_joins_explain
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_self_join
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union_group_by
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_join_part_col_char
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_constprog_dpp
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_leftsemijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.Tes

[jira] [Commented] (HIVE-11487) Add getNumPartitionsByFilter api in metastore api

2015-12-16 Thread Akshay Goyal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061564#comment-15061564
 ] 

Akshay Goyal commented on HIVE-11487:
-

Sorry, i missed adding generated files in the patch. Updated.

> Add getNumPartitionsByFilter api in metastore api
> -
>
> Key: HIVE-11487
> URL: https://issues.apache.org/jira/browse/HIVE-11487
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Amareshwari Sriramadasu
>Assignee: Akshay Goyal
> Attachments: HIVE-11487.01.patch, HIVE-11487.02.patch, 
> HIVE-11487.03.patch, HIVE-11487.04.patch
>
>
> Adding api for getting number of partitions for a filter will be more optimal 
> when we are only interested in the number. getAllPartitions will construct 
> all the partition object which can be time consuming and not required.
> Here is a commit we pushed in a forked repo in our organization - 
> https://github.com/inmobi/hive/commit/68b3534d3e6c4d978132043cec668798ed53e444.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11487) Add getNumPartitionsByFilter api in metastore api

2015-12-16 Thread Akshay Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Goyal updated HIVE-11487:

Attachment: HIVE-11487.04.patch

> Add getNumPartitionsByFilter api in metastore api
> -
>
> Key: HIVE-11487
> URL: https://issues.apache.org/jira/browse/HIVE-11487
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Amareshwari Sriramadasu
>Assignee: Akshay Goyal
> Attachments: HIVE-11487.01.patch, HIVE-11487.02.patch, 
> HIVE-11487.03.patch, HIVE-11487.04.patch
>
>
> Adding api for getting number of partitions for a filter will be more optimal 
> when we are only interested in the number. getAllPartitions will construct 
> all the partition object which can be time consuming and not required.
> Here is a commit we pushed in a forked repo in our organization - 
> https://github.com/inmobi/hive/commit/68b3534d3e6c4d978132043cec668798ed53e444.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12590) Repeated UDAFs with literals can produce incorrect result

2015-12-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12590:

Attachment: HIVE-12590.3.patch

> Repeated UDAFs with literals can produce incorrect result
> -
>
> Key: HIVE-12590
> URL: https://issues.apache.org/jira/browse/HIVE-12590
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.0.1, 1.1.1, 1.2.1, 2.0.0
>Reporter: Laljo John Pullokkaran
>Assignee: Ashutosh Chauhan
>Priority: Critical
> Attachments: HIVE-12590.2.patch, HIVE-12590.3.patch, HIVE-12590.patch
>
>
> Repeated UDAF with literals could produce wrong result.
> This is not a common use case, nevertheless a bug.
> hive> select max('pants'), max('pANTS') from t1 group by key;
>  Total MapReduce CPU Time Spent: 0 msec
> OK
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> Time taken: 296.252 seconds, Fetched: 5 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061531#comment-15061531
 ] 

Thejas M Nair edited comment on HIVE-12698 at 12/17/15 5:41 AM:


[~sershe] I think we should get this interface update into 2.0.0 as well. This 
improves on what is currently in the branch, which is an unreleased change.



was (Author: thejas):
[~sershe] I think we should get this interface update into 2.0.0 as well. This 
improves on what is currently in the branch.


> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch, HIVE-12698.2.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061531#comment-15061531
 ] 

Thejas M Nair commented on HIVE-12698:
--

[~sershe] I think we should get this interface update into 2.0.0 as well. This 
improves on what is currently in the branch.


> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch, HIVE-12698.2.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-12698:
-
Attachment: HIVE-12698.2.patch

Updated patch to address comments

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch, HIVE-12698.2.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061520#comment-15061520
 ] 

Dapeng Sun commented on HIVE-12698:
---

Sorry for the repeat messages, there are something wrong with my web brower

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061509#comment-15061509
 ] 

Dapeng Sun commented on HIVE-12698:
---

I'm agreed with you, we can also fix getHivePrincipal() at {{DDLTask}}

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061511#comment-15061511
 ] 

Dapeng Sun commented on HIVE-12698:
---

I'm agreed with you, we can also fix getHivePrincipal() at {{DDLTask}}

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061510#comment-15061510
 ] 

Dapeng Sun commented on HIVE-12698:
---

I'm agreed with you, we can also fix getHivePrincipal() at {{DDLTask}}

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061513#comment-15061513
 ] 

Dapeng Sun commented on HIVE-12698:
---

I'm agreed with you, we can also fix getHivePrincipal() at {{DDLTask}}

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061512#comment-15061512
 ] 

Dapeng Sun commented on HIVE-12698:
---

I'm agreed with you, we can also fix getHivePrincipal() at {{DDLTask}}

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061508#comment-15061508
 ] 

Dapeng Sun commented on HIVE-12698:
---

I'm agreed with you, we can also fix getHivePrincipal() at {{DDLTask}}

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061473#comment-15061473
 ] 

Ferdinand Xu commented on HIVE-12698:
-

Agree. It will be great to be 
{code}
public HivePrincipal getHivePrincipal(PrincipalDesc principal) 
{code}
which is the same as
{code}
public HivePrivilegeObject getHivePrivilegeObject(PrivilegeObjectDesc 
privSubjectDesc)
throws HiveException;
{code}

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061454#comment-15061454
 ] 

Thejas M Nair commented on HIVE-12698:
--

Also, I noticed that DDLTask.java still calls 
AuthorizationUtils.getHivePrincipal at a few places. I guess that needs to be 
fixed as well.


> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061452#comment-15061452
 ] 

Thejas M Nair commented on HIVE-12698:
--

[~dapengsun] [~Ferd] 
I am wondering if the api HiveAuthorizationTranslator, should have methods that 
operate on a single element instead of a list of them. That seems more generic 
and captures the basic logic - 
ie instead of -
{code} public List getHivePrincipals(List 
principals)  {code}
have - 

{code} public HivePrincipal getHivePrincipals(PrincipalDesc principals) {code}
What are your thoughts ?


> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12684) NPE in stats annotation when all values in decimal column are NULLs

2015-12-16 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12684:
-
Attachment: HIVE-12684.3.patch

Reuploading for precommit tests.

> NPE in stats annotation when all values in decimal column are NULLs
> ---
>
> Key: HIVE-12684
> URL: https://issues.apache.org/jira/browse/HIVE-12684
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12684.1.patch, HIVE-12684.2.patch, 
> HIVE-12684.3.patch, HIVE-12684.3.patch
>
>
> When all column values are null for a decimal column and when column stats 
> exists. AnnotateWithStatistics optimization can throw NPE. Following is the 
> exception trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:712)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:764)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:750)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:197)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:143)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:131)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:114)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:228)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10156)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:225)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12667) Proper fix for HIVE-12473

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061422#comment-15061422
 ] 

Hive QA commented on HIVE-12667:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12778100/HIVE-12667.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6376/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6376/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6376/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6376/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 0f1c112 HIVE-12610: Hybrid Grace Hash Join should fail task 
faster if processing first batch fails, instead of continuing processing the 
rest (Wei Zheng via Vikram Dixit K)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 0f1c112 HIVE-12610: Hybrid Grace Hash Join should fail task 
faster if processing first batch fails, instead of continuing processing the 
rest (Wei Zheng via Vikram Dixit K)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12778100 - PreCommit-HIVE-TRUNK-Build

> Proper fix for HIVE-12473
> -
>
> Key: HIVE-12667
> URL: https://issues.apache.org/jira/browse/HIVE-12667
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-12667.1.patch, HIVE-12667.1.patch
>
>
> HIVE-12473 has added an incorrect comment and also lacks a test case.
> Benefits of this fix:
>* Does not say: "Probably doesn't work"
>* Does not use grammar like "subquery columns and such"
>* Adds test cases, that let you verify the fix
>* Doesn't rely on certain structure of key expr, just takes the type at 
> compile time
>* Doesn't require an additional walk of each key expression
>* Shows the type used in explain



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12695) LLAP: use somebody else's cluster

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061419#comment-15061419
 ] 

Hive QA commented on HIVE-12695:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12778096/HIVE-12695.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 9964 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6375/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6375/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6375/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12778096 - PreCommit-HIVE-TRUNK-Build

> LLAP: use somebody else's cluster
> -
>
> Key: HIVE-12695
> URL: https://issues.apache.org/jira/browse/HIVE-12695
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12695.patch
>
>
> For non-HS2 case cluster sharing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12701) select on table with boolean as partition column shows wrong result

2015-12-16 Thread Sudipto Nandan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudipto Nandan updated HIVE-12701:
--
Component/s: SQL
 Database/Schema

> select on table with boolean as partition column shows wrong result
> ---
>
> Key: HIVE-12701
> URL: https://issues.apache.org/jira/browse/HIVE-12701
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, SQL
>Affects Versions: 1.1.0
>Reporter: Sudipto Nandan
>
> create table hive_aprm02ht7(a int, b int, c int) partitioned by (p boolean) 
> row format delimited fields terminated by ',' stored as textfile;
> load data local inpath 'hive_data8.txt' into table hive_aprm02ht7 partition 
> (p=true);
> load data local inpath 'hive_data8.txt' into table hive_aprm02ht7 partition 
> (p=false);
> describe hive_aprm02ht7;
> col_namedata_type   comment
> a   int
> b   int
> c   int
> p   boolean
> # Partition Information
> # col_name  data_type   comment
> p   boolean
> show partitions hive_aprm02ht7;
> OK
> p=false
> p=true
> Time taken: 0.057 seconds, Fetched: 2 row(s)
> -- everything is shown as true. But first three should be true and the last 
> three rows should be false
> hive>  select * from hive_aprm02ht7 where p in (true,false);
> OK
> 1   2   3   true
> 4   5   6   true
> 7   8   9   true
> 1   2   3   true
> 4   5   6   true
> 7   8   9   true
> Time taken: 0.068 seconds, Fetched: 6 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8494) Hive partitioned table with smallint datatype

2015-12-16 Thread Sudipto Nandan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudipto Nandan updated HIVE-8494:
-
Affects Version/s: 1.1.0

> Hive partitioned table with smallint datatype
> -
>
> Key: HIVE-8494
> URL: https://issues.apache.org/jira/browse/HIVE-8494
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Query Processor
>Affects Versions: 0.12.0, 0.13.0, 1.1.0
>Reporter: Sudipto Nandan
>
> create a hive partitioned table with partitioning column datatype smallint
> col_namedata_type   comment
> a   int None
> b   int None
> c   int None
> p   smallint None
> Partition Information
> col_name  data_type   comment
> psmallintNone
> Put the following data. See the partition value is 32768 - which exceeds the 
> smallint limit by 1
> select * from t;
> a   b   c   p
> 1   2   3   32768
> 4   5   6   32768
> 7   8   9   32768
> hive> select sum(p) from t;
> also works
> but
> hive> select min(p) from t;
> fails
> It should disallow creation of partition with value 32768 as it exceeds the 
> smallint limit (SMALLINT (2-byte signed integer, from -32,768 to 32,767)
> The same issue is even with int and partition column value of 2,147,483,648 
> which exceeds the int limit (INT (4-byte signed integer, from -2,147,483,648 
> to 2,147,483,647))
> )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11797) Alter table change columnname doesn't work on avro serde hive table

2015-12-16 Thread Sudipto Nandan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudipto Nandan updated HIVE-11797:
--
Affects Version/s: 1.1.0

> Alter table change columnname doesn't work on avro serde hive table
> ---
>
> Key: HIVE-11797
> URL: https://issues.apache.org/jira/browse/HIVE-11797
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Sudipto Nandan
>Assignee: Chaoyu Tang
>
> We create a table using Avro serde 
> Hive table name hive_t1.
> Then we try to change the column name.
> The commands ends successfully but the name of the column is not modified.
> create table if not exists hive_t1
> partitioned by (p1 int)
> row format SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> TBLPROPERTIES ('avro.schema.literal'='{
>   "namespace": "testing.hive.avro.serde",
>   "name": "avro_table",
>   "type": "record",
>   "fields": [
> {
>   "name":"number",
>   "type":"int",
>   "doc":"Order of playing the role"
> },
> {
>   "name":"first_name",
>   "type":"string",
>   "doc":"first name of actor playing role"
> },
> {
>   "name":"last_name",
>   "type":"string",
>   "doc":"last name of actor playing role"
> },
> {
>   "name":"extra_field",
>   "type":"string",
>   "doc:":"an extra field not in the original file",
>   "default":"fishfingers and custard"
> }
>   ]
> }');
> hive> alter table hive_t1 change column number number1 int;
> OK
> Time taken: 0.12 seconds
> hive> select * from hive_t1 limit 5;
> OK
> hive_t1.number   hive_t1.first_name   hive_t1.last_name
> hive_t1.extra_field  hive_t1.p1
> 6   Colin   Baker   fishfingers and custard 100
> 3   Jon Pertwee fishfingers and custard 100
> 4   Tom Baker   fishfingers and custard 100
> 5   Peter   Davison fishfingers and custard 100
> 11  MattSmith   fishfingers and custard 100
> Time taken: 0.05 seconds, Fetched: 5 row(s)
> hive> describe hive_t1;
> OK
> col_namedata_type   comment
> number  int from deserializer   
> first_name  string  from deserializer   
> last_name   string  from deserializer   
> extra_field string  from deserializer   
> p1  int 
>  
> # Partition Information  
> # col_name  data_type   comment 
>  
> p1  int 
> Time taken: 0.051 seconds, Fetched: 10 row(s)
> -- Using the below command also the column name is not changed from "number" 
> to "number1"
> hive> alter table hive_t1 change number number1 int;
> OK
> Time taken: 0.081 seconds
> hive>  describe hive_t1;
> OK
> col_namedata_type   comment
> number  int from deserializer   
> first_name  string  from deserializer   
> last_name   string  from deserializer   
> extra_field string  from deserializer   
> p1  int 
>  
> # Partition Information  
> # col_name  data_type   comment 
>  
> p1  int 
> Time taken: 0.054 seconds, Fetched: 10 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12470) Allow splits to provide custom consistent locations, instead of being tied to data locality

2015-12-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061390#comment-15061390
 ] 

Siddharth Seth commented on HIVE-12470:
---

RB already exists.
On the sorting - the list can change on each refresh, and it isn't known 
whether the list actually changes or not. That could be tracked. However, given 
this is not accessed very frequently, I did not try to optimize away the sort.
Cache registries by name - for a single client which may communicate with 
different llap instances. e.g. a single hive server instance which can submit 
to different llap daemons.

> Allow splits to provide custom consistent locations, instead of being tied to 
> data locality
> ---
>
> Key: HIVE-12470
> URL: https://issues.apache.org/jira/browse/HIVE-12470
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-12470.1.txt, HIVE-12470.1.wip.txt
>
>
> LLAP instances may not run on the same nodes as HDFS, or may run on a subset 
> of the cluster.
> Using split locations based on FileSystem locality is not very useful in such 
> cases - since that guarantees not getting any locality.
> Allow a split to map to a specific location - so that there's a chance of 
> getting cache locality across different queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12676) [hive+impala] Alter table Rename to + Set location in a single step

2015-12-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061378#comment-15061378
 ] 

Lefty Leverenz commented on HIVE-12676:
---

[~egmont@c], yes you should file this request separately for Impala.  Thanks.

> [hive+impala] Alter table Rename to + Set location in a single step
> ---
>
> Key: HIVE-12676
> URL: https://issues.apache.org/jira/browse/HIVE-12676
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Egmont Koblinger
>Assignee: Dmitry Tolpeko
>Priority: Minor
>
> Assume a nonstandard table location, let's say /foo/bar/table1. You might 
> want to rename from table1 to table2 and move the underlying data accordingly 
> to /foo/bar/table2.
> The "alter table ... rename to ..." clause alters the table name, but in the 
> same step moves the data into the standard location 
> /user/hive/warehouse/table2. Then a subsequent "alter table ... set location 
> ..." can move it back to the desired location /foo/bar/table2.
> This is problematic if there's any permission problem in the game, e.g. not 
> being able to write to /user/hive/warehouse. So it should be possible to move 
> the underlying data to its desired final place without intermittent places in 
> between.
> A probably hard to discover workaround is to set the table to external, then 
> rename it, then set back to internal and then change its location.
> It would be great to be able to do an "alter table ... rename to ... set 
> location ..." operation in a single step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12667) Proper fix for HIVE-12473

2015-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061376#comment-15061376
 ] 

Sergey Shelukhin commented on HIVE-12667:
-

+1 pending tests. I guess we don't have to check for string.

> Proper fix for HIVE-12473
> -
>
> Key: HIVE-12667
> URL: https://issues.apache.org/jira/browse/HIVE-12667
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-12667.1.patch, HIVE-12667.1.patch
>
>
> HIVE-12473 has added an incorrect comment and also lacks a test case.
> Benefits of this fix:
>* Does not say: "Probably doesn't work"
>* Does not use grammar like "subquery columns and such"
>* Adds test cases, that let you verify the fix
>* Doesn't rely on certain structure of key expr, just takes the type at 
> compile time
>* Doesn't require an additional walk of each key expression
>* Shows the type used in explain



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12699) LLAP: hive.llap.daemon.work.dirs setting backward compat name doesn't work

2015-12-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061369#comment-15061369
 ] 

Gopal V commented on HIVE-12699:


LGTM - +1.



> LLAP: hive.llap.daemon.work.dirs setting backward compat name doesn't work 
> ---
>
> Key: HIVE-12699
> URL: https://issues.apache.org/jira/browse/HIVE-12699
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Attachments: HIVE-12699.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2015-12-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12352:
--
Description: 
   Worker will start with DB in state X (wrt this partition).
   while it's working more txns will happen, against partition it's compacting.
   then this will delete state up to X and since then.  There may be new delta 
files created
   between compaction starting and cleaning.  These will not be compacted until 
more
   transactions happen.  So this ideally should only delete
   up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also run
   at READ_COMMITTED.  So this means we'd want to store HWM in COMPACTION_QUEUE 
when
   Worker picks up the job.

Actually the problem is even worse (but also solved using HWM as above):
Suppose some transactions (against same partition) have started and aborted 
since the time Worker ran compaction job.
That means there are never-compacted delta files with data that belongs to 
these aborted txns.

Following will pick up these aborted txns.
s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
txn_state = '" +
  TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table 
= '" +
  info.tableName + "'";
if (info.partName != null) s += " and tc_partition = '" + info.partName 
+ "'";

The logic after that will delete relevant data from TXN_COMPONENTS and if one 
of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns().  
At that point any metadata about an Aborted txn is gone and the system will 
think it's committed.

HWM in this case would be (in ValidCompactorTxnList)
if(minOpenTxn > 0)
min(highWaterMark, minOpenTxn) 
else 
highWaterMark


  was:
   Worker will start with DB in state X (wrt this partition).
   while it's working more txns will happen, against partition it's compacting.
   then this will delete state up to X and since then.  There may be new delta 
files created
   between compaction starting and cleaning.  These will not be compacted until 
more
   transactions happen.  So this ideally should only delete
   up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also run
   at READ_COMMITTED.  So this means we'd want to store HWM in COMPACTION_QUEUE 
when
   Worker picks up the job.

Actually the problem is even worse (but also solved using HWM as above):
Suppose some transactions (against same partition) have started and aborted 
since the time Worker ran compaction job.
That means there are never-compacted delta files with data that belongs to 
these aborted txns.

Following will pick up these aborted txns.
s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
txn_state = '" +
  TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table 
= '" +
  info.tableName + "'";
if (info.partName != null) s += " and tc_partition = '" + info.partName 
+ "'";

The logic after that will delete relevant data from TXN_COMPONENTS and if one 
of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns().  
At that point any metadata about an Aborted txn is gone and the system will 
think it's committed.




> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'

[jira] [Updated] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-12698:
-
Attachment: HIVE-12698.1.patch

> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12698.1.patch
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12698) Remove exposure to internal privilege and principal classes in HiveAuthorizer

2015-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061356#comment-15061356
 ] 

ASF GitHub Bot commented on HIVE-12698:
---

GitHub user thejasmn opened a pull request:

https://github.com/apache/hive/pull/58

HIVE-12698 : introduce HiveAuthorizationTranslator interface for isolating 
authori…

…zation impls from hive internal classes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thejasmn/hive HIVE-12698

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/58.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #58


commit 27e73f2a45aa3e5a158c14fb0693567158cef0d7
Author: Thejas Nair 
Date:   2015-12-17T02:36:52Z

introduce HiveAuthorizationTranslator interface for isolating authorization 
impls from hive internal classes




> Remove exposure to internal privilege and principal classes in HiveAuthorizer
> -
>
> Key: HIVE-12698
> URL: https://issues.apache.org/jira/browse/HIVE-12698
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.3.0, 2.0.0
>
>
> The changes in HIVE-11179 expose several internal classes to 
> HiveAuthorization implementations. These include PrivilegeObjectDesc, 
> PrivilegeDesc, PrincipalDesc and AuthorizationUtils.
> We should avoid exposing that to all Authorization implementations, but also 
> make the ability to customize the mapping of internal classes to the public 
> api classes possible for Apache Sentry (incubating).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12570) Incorrect error message Expression not in GROUP BY key thrown instead of Invalid function

2015-12-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061350#comment-15061350
 ] 

Lefty Leverenz commented on HIVE-12570:
---

[~hsubramaniyan], Fix Version/s should include 1.3.0.

> Incorrect error message Expression not in GROUP BY key thrown instead of 
> Invalid function
> -
>
> Key: HIVE-12570
> URL: https://issues.apache.org/jira/browse/HIVE-12570
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.1.0
>
> Attachments: HIVE-12570.1.patch, HIVE-12570.2.patch, 
> HIVE-12570.3.patch, HIVE-12570.4.patch, HIVE-12570.5.patch
>
>
> {code}
> explain create table avg_salary_by_supervisor3 as select average(key) as 
> key_avg from src group by value;
> {code}
> We get the following stack trace :
> {code}
> FAILED: SemanticException [Error 10025]: Line 1:57 Expression not in GROUP BY 
> key 'key'
> ERROR ql.Driver: FAILED: SemanticException [Error 10025]: Line 1:57 
> Expression not in GROUP BY key 'key'
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:57 Expression not 
> in GROUP BY key 'key'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10484)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10432)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3824)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3603)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8862)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8817)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9668)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9561)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10053)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:345)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10064)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:222)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:462)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1227)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1276)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1152)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1140)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> Instead of the above error message, it be more appropriate to throw the below 
> error :
> ERROR ql.Driver: FAILED: SemanticException [Error 10011]: Line 1:58 Invalid 
> function 'average'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11179) HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers

2015-12-16 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061343#comment-15061343
 ] 

Dapeng Sun commented on HIVE-11179:
---

Thank [~thejas] for your follow up. I will also think about how to minimize the 
api change.

> HIVE should allow custom converting from HivePrivilegeObjectDesc to 
> privilegeObject for different authorizers
> -
>
> Key: HIVE-11179
> URL: https://issues.apache.org/jira/browse/HIVE-11179
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>  Labels: Authorization
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11179.001.patch, HIVE-11179.001.patch
>
>
> HIVE should allow custom converting from HivePrivilegeObjectDesc to 
> privilegeObject for different authorizers:
> There is a case in Apache Sentry: Sentry support uri and server level 
> privilege, but in hive side, it uses 
> {{AuthorizationUtils.getHivePrivilegeObject(privSubjectDesc)}} to do the 
> converting, and the code in {{getHivePrivilegeObject()}} only handle the 
> scenes for table and database 
> {noformat}
> privSubjectDesc.getTable() ? HivePrivilegeObjectType.TABLE_OR_VIEW :
> HivePrivilegeObjectType.DATABASE;
> {noformat}
> A solution is move this method to {{HiveAuthorizer}}, so that a custom 
> Authorizer could enhance it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12700) complex join keys cannot be recognized in Hive 0.13

2015-12-16 Thread Xiaoyong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyong Zhu updated HIVE-12700:

Attachment: job explain plan.txt
Implicit Joins.hql
explicit join key.hql

> complex join keys cannot be recognized in Hive 0.13
> ---
>
> Key: HIVE-12700
> URL: https://issues.apache.org/jira/browse/HIVE-12700
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.13.1
>Reporter: Xiaoyong Zhu
>Priority: Critical
> Attachments: Implicit Joins.hql, explicit join key.hql, job explain 
> plan.txt
>
>
> Hi Experts
> I am using Hive 0.13 and find a potential bug. Attached “implicit join.hql” 
> has several join keys (for example store_sales.ss_addr_sk = 
> customer_address.ca_address_sk) and cannot be regonized by Hive. In such 
> cases hive won’t be able to optimize and can only do a cross join first which 
> makes the job runs really long. If I change the log to explicit join keys, 
> then it works well.
> For the below simple query hive can regcogonize the join keys, and I think 
> Hive should be able to handle the complex situations such as my example, 
> right?
>  
> SELECT * 
> FROM table1 t1, table2 t2, table3 t3 
> WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11865) Disable Hive PPD optimizer when CBO has optimized the plan

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061329#comment-15061329
 ] 

Hive QA commented on HIVE-11865:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12778074/HIVE-11865.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 82 failed/errored test(s), 9964 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_boolexpr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_columns
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_join_part_col_char
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_between_columns
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_part_col_char
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query13
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query15
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query17
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query18
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query19
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query20
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query21
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query22
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query25
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query26
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query27
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query28
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query29
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query3
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query31
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query32
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query34
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query39
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query40
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query42
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query43
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query45
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query46
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query48
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query50
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query51
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query52
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query54
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query55
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query58
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query64
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query65
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query66
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query67
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query68
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query7
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query70
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query71
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query72
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query73
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query75
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query76
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query79
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query80
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query82
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query84
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query85
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query87
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCli

[jira] [Updated] (HIVE-12699) LLAP: hive.llap.daemon.work.dirs setting backward compat name doesn't work

2015-12-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12699:

Attachment: HIVE-12699.patch

[~gopalv] can you take a look? Trivial patch.

> LLAP: hive.llap.daemon.work.dirs setting backward compat name doesn't work 
> ---
>
> Key: HIVE-12699
> URL: https://issues.apache.org/jira/browse/HIVE-12699
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Trivial
> Attachments: HIVE-12699.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11179) HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061317#comment-15061317
 ] 

Thejas M Nair commented on HIVE-11179:
--

Created HIVE-12698 to track the changes to reduce exposure to hive internal 
classes to general authorization implementations.

The changes should also help in reducing chances of breakage of other 
authorization implementation changes with newer changes.


> HIVE should allow custom converting from HivePrivilegeObjectDesc to 
> privilegeObject for different authorizers
> -
>
> Key: HIVE-11179
> URL: https://issues.apache.org/jira/browse/HIVE-11179
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>  Labels: Authorization
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11179.001.patch, HIVE-11179.001.patch
>
>
> HIVE should allow custom converting from HivePrivilegeObjectDesc to 
> privilegeObject for different authorizers:
> There is a case in Apache Sentry: Sentry support uri and server level 
> privilege, but in hive side, it uses 
> {{AuthorizationUtils.getHivePrivilegeObject(privSubjectDesc)}} to do the 
> converting, and the code in {{getHivePrivilegeObject()}} only handle the 
> scenes for table and database 
> {noformat}
> privSubjectDesc.getTable() ? HivePrivilegeObjectType.TABLE_OR_VIEW :
> HivePrivilegeObjectType.DATABASE;
> {noformat}
> A solution is move this method to {{HiveAuthorizer}}, so that a custom 
> Authorizer could enhance it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12470) Allow splits to provide custom consistent locations, instead of being tied to data locality

2015-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061291#comment-15061291
 ] 

Sergey Shelukhin commented on HIVE-12470:
-

Can you post a RB?
Why not store the list pre-sorted instead of sorting every time?
Also, what is the need for the cache of registries by name?

> Allow splits to provide custom consistent locations, instead of being tied to 
> data locality
> ---
>
> Key: HIVE-12470
> URL: https://issues.apache.org/jira/browse/HIVE-12470
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-12470.1.txt, HIVE-12470.1.wip.txt
>
>
> LLAP instances may not run on the same nodes as HDFS, or may run on a subset 
> of the cluster.
> Using split locations based on FileSystem locality is not very useful in such 
> cases - since that guarantees not getting any locality.
> Allow a split to map to a specific location - so that there's a chance of 
> getting cache locality across different queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12570) Incorrect error message Expression not in GROUP BY key thrown instead of Invalid function

2015-12-16 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061283#comment-15061283
 ] 

Laljo John Pullokkaran commented on HIVE-12570:
---

https://issues.apache.org/jira/browse/HIVE-12570 is not about this bug.

> Incorrect error message Expression not in GROUP BY key thrown instead of 
> Invalid function
> -
>
> Key: HIVE-12570
> URL: https://issues.apache.org/jira/browse/HIVE-12570
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.1.0
>
> Attachments: HIVE-12570.1.patch, HIVE-12570.2.patch, 
> HIVE-12570.3.patch, HIVE-12570.4.patch, HIVE-12570.5.patch
>
>
> {code}
> explain create table avg_salary_by_supervisor3 as select average(key) as 
> key_avg from src group by value;
> {code}
> We get the following stack trace :
> {code}
> FAILED: SemanticException [Error 10025]: Line 1:57 Expression not in GROUP BY 
> key 'key'
> ERROR ql.Driver: FAILED: SemanticException [Error 10025]: Line 1:57 
> Expression not in GROUP BY key 'key'
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:57 Expression not 
> in GROUP BY key 'key'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10484)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10432)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3824)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3603)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8862)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8817)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9668)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9561)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10053)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:345)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10064)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:222)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:462)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1227)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1276)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1152)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1140)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> Instead of the above error message, it be more appropriate to throw the below 
> error :
> ERROR ql.Driver: FAILED: SemanticException [Error 10011]: Line 1:58 Invalid 
> function 'average'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12534) Date functions with vectorization is returning wrong results

2015-12-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved HIVE-12534.

Resolution: Duplicate

This is duplicated by HIVE-12479 and HIVE-12535

> Date functions with vectorization is returning wrong results
> 
>
> Key: HIVE-12534
> URL: https://issues.apache.org/jira/browse/HIVE-12534
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: p26_explain.txt, plan.txt
>
>
> {noformat}
> select c.effective_date, year(c.effective_date), month(c.effective_date) from 
> customers c where c.customer_id = 146028;
> hive> set hive.vectorized.execution.enabled=true;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  0   0
> hive> set hive.vectorized.execution.enabled=false;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  201511
> {noformat}
> \cc [~gopalv], [~sseth], [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11927) Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants

2015-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11927:
---
Attachment: HIVE-11927.14.patch

> Implement/Enable constant related optimization rules in Calcite: enable 
> HiveReduceExpressionsRule to fold constants
> ---
>
> Key: HIVE-11927
> URL: https://issues.apache.org/jira/browse/HIVE-11927
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11927.01.patch, HIVE-11927.02.patch, 
> HIVE-11927.03.patch, HIVE-11927.04.patch, HIVE-11927.05.patch, 
> HIVE-11927.06.patch, HIVE-11927.07.patch, HIVE-11927.08.patch, 
> HIVE-11927.09.patch, HIVE-11927.10.patch, HIVE-11927.11.patch, 
> HIVE-11927.12.patch, HIVE-11927.13.patch, HIVE-11927.14.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12518) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix test failure for groupby_resolution.q

2015-12-16 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061248#comment-15061248
 ] 

Pengcheng Xiong commented on HIVE-12518:


cc'ing [~jpullokkaran]. This is the issue for current return path. Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix test 
> failure for groupby_resolution.q
> ---
>
> Key: HIVE-12518
> URL: https://issues.apache.org/jira/browse/HIVE-12518
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> The problem can be reproduced when there is no map group by and the data is 
> skewed for return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12610) Hybrid Grace Hash Join should fail task faster if processing first batch fails, instead of continuing processing the rest

2015-12-16 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-12610:
--
Fix Version/s: 2.0.0

> Hybrid Grace Hash Join should fail task faster if processing first batch 
> fails, instead of continuing processing the rest
> -
>
> Key: HIVE-12610
> URL: https://issues.apache.org/jira/browse/HIVE-12610
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0
>
> Attachments: HIVE-12610.1.patch, HIVE-12610.2.patch, 
> HIVE-12610.branch-1.patch
>
>
> During processing the spilled partitions, if there's any fatal error, such as 
> Kryo exception, then we should exit early, instead of moving on to process 
> the rest of spilled partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11179) HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061252#comment-15061252
 ] 

Thejas M Nair commented on HIVE-11179:
--

This patch is exposing many hive internal classes in the Authorization plugin 
interface. Classes exposed through the interface would be considered as public 
API by the users.
But I also understand that sentry is quite intertwined with hive internals and 
needs this ability to do this custom conversion.
I think we can minimize the exposure to other api users as well as provide 
Sentry with the ability it needs by tweaking this change some more. I will 
create a follow up jira.


> HIVE should allow custom converting from HivePrivilegeObjectDesc to 
> privilegeObject for different authorizers
> -
>
> Key: HIVE-11179
> URL: https://issues.apache.org/jira/browse/HIVE-11179
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>  Labels: Authorization
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11179.001.patch, HIVE-11179.001.patch
>
>
> HIVE should allow custom converting from HivePrivilegeObjectDesc to 
> privilegeObject for different authorizers:
> There is a case in Apache Sentry: Sentry support uri and server level 
> privilege, but in hive side, it uses 
> {{AuthorizationUtils.getHivePrivilegeObject(privSubjectDesc)}} to do the 
> converting, and the code in {{getHivePrivilegeObject()}} only handle the 
> scenes for table and database 
> {noformat}
> privSubjectDesc.getTable() ? HivePrivilegeObjectType.TABLE_OR_VIEW :
> HivePrivilegeObjectType.DATABASE;
> {noformat}
> A solution is move this method to {{HiveAuthorizer}}, so that a custom 
> Authorizer could enhance it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12470) Allow splits to provide custom consistent locations, instead of being tied to data locality

2015-12-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12470:
--
Attachment: HIVE-12470.1.txt

Patch for review. cc [~gopalv], [~sershe]

> Allow splits to provide custom consistent locations, instead of being tied to 
> data locality
> ---
>
> Key: HIVE-12470
> URL: https://issues.apache.org/jira/browse/HIVE-12470
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-12470.1.txt, HIVE-12470.1.wip.txt
>
>
> LLAP instances may not run on the same nodes as HDFS, or may run on a subset 
> of the cluster.
> Using split locations based on FileSystem locality is not very useful in such 
> cases - since that guarantees not getting any locality.
> Allow a split to map to a specific location - so that there's a chance of 
> getting cache locality across different queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12610) Hybrid Grace Hash Join should fail task faster if processing first batch fails, instead of continuing processing the rest

2015-12-16 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061242#comment-15061242
 ] 

Vikram Dixit K commented on HIVE-12610:
---

Thanks Wei!

> Hybrid Grace Hash Join should fail task faster if processing first batch 
> fails, instead of continuing processing the rest
> -
>
> Key: HIVE-12610
> URL: https://issues.apache.org/jira/browse/HIVE-12610
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 1.3.0, 1.2.2, 2.1.0
>
> Attachments: HIVE-12610.1.patch, HIVE-12610.2.patch, 
> HIVE-12610.branch-1.patch
>
>
> During processing the spilled partitions, if there's any fatal error, such as 
> Kryo exception, then we should exit early, instead of moving on to process 
> the rest of spilled partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11775) Implement limit push down through union all in CBO

2015-12-16 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061229#comment-15061229
 ] 

Pengcheng Xiong commented on HIVE-11775:


We have a clean QA run. [~jpullokkaran], could you please take a look? Thanks.

> Implement limit push down through union all in CBO
> --
>
> Key: HIVE-11775
> URL: https://issues.apache.org/jira/browse/HIVE-11775
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11775.01.patch, HIVE-11775.02.patch, 
> HIVE-11775.03.patch, HIVE-11775.04.patch, HIVE-11775.05.patch, 
> HIVE-11775.06.patch, HIVE-11775.07.patch, HIVE-11775.08.patch, 
> HIVE-11775.09.patch, HIVE-11775.10.patch
>
>
> Enlightened by HIVE-11684 (Kudos to [~jcamachorodriguez]), we can actually 
> push limit down through union all, which reduces the intermediate number of 
> rows in union branches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11775) Implement limit push down through union all in CBO

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061224#comment-15061224
 ] 

Hive QA commented on HIVE-11775:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12778068/HIVE-11775.10.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 9965 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarDataNucleusUnCaching
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6373/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6373/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6373/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12778068 - PreCommit-HIVE-TRUNK-Build

> Implement limit push down through union all in CBO
> --
>
> Key: HIVE-11775
> URL: https://issues.apache.org/jira/browse/HIVE-11775
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11775.01.patch, HIVE-11775.02.patch, 
> HIVE-11775.03.patch, HIVE-11775.04.patch, HIVE-11775.05.patch, 
> HIVE-11775.06.patch, HIVE-11775.07.patch, HIVE-11775.08.patch, 
> HIVE-11775.09.patch, HIVE-11775.10.patch
>
>
> Enlightened by HIVE-11684 (Kudos to [~jcamachorodriguez]), we can actually 
> push limit down through union all, which reduces the intermediate number of 
> rows in union branches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12661) StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly

2015-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12661:
---
Attachment: HIVE-12661.04.patch

> StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly
> ---
>
> Key: HIVE-12661
> URL: https://issues.apache.org/jira/browse/HIVE-12661
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12661.01.patch, HIVE-12661.02.patch, 
> HIVE-12661.03.patch, HIVE-12661.04.patch
>
>
> PROBLEM:
> Hive stats are autogathered properly till an 'analyze table [tablename] 
> compute statistics for columns' is run. Then it does not auto-update the 
> stats till the command is run again. repo:
> {code}
> set hive.stats.autogather=true; 
> set hive.stats.atomic=false ; 
> set hive.stats.collect.rawdatasize=true ; 
> set hive.stats.collect.scancols=false ; 
> set hive.stats.collect.tablekeys=false ; 
> set hive.stats.fetch.column.stats=true; 
> set hive.stats.fetch.partition.stats=true ; 
> set hive.stats.reliable=false ; 
> set hive.compute.query.using.stats=true; 
> CREATE TABLE `default`.`calendar` (`year` int) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 
> 'orc.compress'='NONE') ; 
> insert into calendar values (2010), (2011), (2012); 
> select * from calendar; 
> ++--+ 
> | calendar.year | 
> ++--+ 
> | 2010 | 
> | 2011 | 
> | 2012 | 
> ++--+ 
> select max(year) from calendar; 
> | 2012 | 
> insert into calendar values (2013); 
> select * from calendar; 
> ++--+ 
> | calendar.year | 
> ++--+ 
> | 2010 | 
> | 2011 | 
> | 2012 | 
> | 2013 | 
> ++--+ 
> select max(year) from calendar; 
> | 2013 | 
> insert into calendar values (2014); 
> select max(year) from calendar; 
> | 2014 |
> analyze table calendar compute statistics for columns;
> insert into calendar values (2015);
> select max(year) from calendar;
> | 2014 |
> insert into calendar values (2016), (2017), (2018);
> select max(year) from calendar;
> | 2014  |
> analyze table calendar compute statistics for columns;
> select max(year) from calendar;
> | 2018  |
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-12-16 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11355:
--
Attachment: HIVE-11355.7.patch

Fix for a couple of failing tests.

> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch, HIVE-11355.4.patch, HIVE-11355.5.patch, 
> HIVE-11355.6.patch, HIVE-11355.7.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml

2015-12-16 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061138#comment-15061138
 ] 

Wei Zheng commented on HIVE-12685:
--

[~ashutoshc] Can you take a look?

> Remove invalid property in common/src/test/resources/hive-site.xml
> --
>
> Key: HIVE-12685
> URL: https://issues.apache.org/jira/browse/HIVE-12685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12685.1.patch, HIVE-12685.2.patch, 
> HIVE-12685.3.patch
>
>
> Currently there's such a property as below, which is obviously wrong
> {code}
> 
>   javax.jdo.option.ConnectionDriverName
>   hive-site.xml
>   Override ConfVar defined in HiveConf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml

2015-12-16 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12685:
-
Attachment: HIVE-12685.3.patch

Patch 3, which removes the unnecessary common/src/test/resources/hive-site.xml

> Remove invalid property in common/src/test/resources/hive-site.xml
> --
>
> Key: HIVE-12685
> URL: https://issues.apache.org/jira/browse/HIVE-12685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12685.1.patch, HIVE-12685.2.patch, 
> HIVE-12685.3.patch
>
>
> Currently there's such a property as below, which is obviously wrong
> {code}
> 
>   javax.jdo.option.ConnectionDriverName
>   hive-site.xml
>   Override ConfVar defined in HiveConf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml

2015-12-16 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061131#comment-15061131
 ] 

Wei Zheng commented on HIVE-12685:
--

Currently we have difference versions of hive-site.xml located everywhere
{code}
wzheng /tmp/hive $ find . -name hive-site.xml
./beeline/src/test/resources/hive-site.xml
./common/src/test/resources/hive-site.xml
./conf/hive-site.xml
./data/conf/hive-site.xml
./data/conf/llap/hive-site.xml
./data/conf/perf-reg/hive-site.xml
./data/conf/spark/standalone/hive-site.xml
./data/conf/spark/yarn-client/hive-site.xml
./data/conf/tez/hive-site.xml
./hcatalog/src/test/e2e/templeton/deployers/config/hive/hive-site.xml
{code}
The one causing problem of this JIRA is common/src/test/resources/hive-site.xml

Instead of maintaining multiple pieces, we should get rid of unnecessary 
hive-site.xml as much as possible. So we should remove 
common/src/test/resources/hive-site.xml and just use the default one from 
data/conf (for TestHiveConf).

> Remove invalid property in common/src/test/resources/hive-site.xml
> --
>
> Key: HIVE-12685
> URL: https://issues.apache.org/jira/browse/HIVE-12685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12685.1.patch, HIVE-12685.2.patch
>
>
> Currently there's such a property as below, which is obviously wrong
> {code}
> 
>   javax.jdo.option.ConnectionDriverName
>   hive-site.xml
>   Override ConfVar defined in HiveConf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12570) Incorrect error message Expression not in GROUP BY key thrown instead of Invalid function

2015-12-16 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061112#comment-15061112
 ] 

Matt McCline commented on HIVE-12570:
-

[~hsubramaniyan] [~jpullokkaran]

I'm getting "Expression not in GROUP BY key 'wr_return_quantity'] from 
(TOK_TABLE_OR_COL wr)" on TPCDS-49 on master but not on branch-1.

It occurs before and after HIVE-12570 fix, so perhaps the fix is incomplete???

See https://hortonworks.jira.com/browse/BUG-48057 for the query text.

> Incorrect error message Expression not in GROUP BY key thrown instead of 
> Invalid function
> -
>
> Key: HIVE-12570
> URL: https://issues.apache.org/jira/browse/HIVE-12570
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.1.0
>
> Attachments: HIVE-12570.1.patch, HIVE-12570.2.patch, 
> HIVE-12570.3.patch, HIVE-12570.4.patch, HIVE-12570.5.patch
>
>
> {code}
> explain create table avg_salary_by_supervisor3 as select average(key) as 
> key_avg from src group by value;
> {code}
> We get the following stack trace :
> {code}
> FAILED: SemanticException [Error 10025]: Line 1:57 Expression not in GROUP BY 
> key 'key'
> ERROR ql.Driver: FAILED: SemanticException [Error 10025]: Line 1:57 
> Expression not in GROUP BY key 'key'
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:57 Expression not 
> in GROUP BY key 'key'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10484)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10432)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3824)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3603)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8862)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8817)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9668)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9561)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10053)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:345)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10064)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:222)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:462)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1227)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1276)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1152)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1140)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> Instead of the above error message, it be more appropriate to throw the below 
> error :
> ERROR ql.Driver: FAILED: SemanticException [Error 10011]: Line 1:58 Invalid 
> function 'average'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11927) Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061074#comment-15061074
 ] 

Hive QA commented on HIVE-11927:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12778057/HIVE-11927.13.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 35 failed/errored test(s), 9966 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_constant_expr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_hour
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_minute
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_parse_url
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_second
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_elt
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_elt
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query31
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query39
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query42
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query52
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query64
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query66
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query75
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_elt
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6372/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6372/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6372/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 35 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12778057 - PreCommit-HIVE-TRUNK-Build

> Implement/Enable constant related optimization rules in Calcite: enable 
> HiveReduceExpressionsRule to fold constants
> ---
>
> Key: HIVE-11927
> URL: https://issues.apache.org/jira/browse/HIVE-11927
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11927.01.patch, HIVE-11927.02.patch, 
> HIVE-11927.03.patch, HIVE-11927.04.patch, HIVE-11927.05.patch, 
> HIVE-11927.06.patch, HIVE-11927.07.patch, HIVE-11927.08.patch, 
> HIVE-11927.09.patch, HIVE-11927.10.patch, HIVE-11927.11.patch, 
> HIVE-11927.12.patch, HIVE-11927.13.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12697) Remove deprecated post option from webhcat test files

2015-12-16 Thread Aswathy Chellammal Sreekumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar updated HIVE-12697:

Attachment: HIVE-12697.1.patch

[~eugene.koifman] Please review the patch

> Remove deprecated post option from webhcat test files
> -
>
> Key: HIVE-12697
> URL: https://issues.apache.org/jira/browse/HIVE-12697
> Project: Hive
>  Issue Type: Test
>  Components: WebHCat
>Affects Versions: 2.0.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Aswathy Chellammal Sreekumar
>  Labels: test
> Attachments: HIVE-12697.1.patch
>
>
> Tests are still having the deprecated post option user.name. Need to remove 
> them and add the same to query string



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11388) there should only be 1 Initiator for compactions per Hive installation

2015-12-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061006#comment-15061006
 ] 

Eugene Koifman commented on HIVE-11388:
---

here is one general purpose mechanism:
create table MUTEX_TABLE(keyname varchar(512) PRIMARY KEY)

Then any process that requires a mutex needs to insert a row into this table 
(as long as everyone agrees on the key) and then do a "Select for update" on 
this row.  If the process dies, "select for update" lock is automatically 
released.

For example, if 2 Initiator instances want to schedule a compaction, each could
1. select * from MUTEX_TABLE where keyname="initiator" for update.
If the "initiator" row is already there, only 1 will succeed.  The other one, 
once it unblocks, will already see "this" compaction scheduled.
2. if select in 1 misses, then Initiator can insert "initiator" row and then 
goto 1.  Because of PK only 1 will succeed.

Since the keyname is arbitrary, it can be "db/table/partition" to coordinate 
Workers if necessary.

A little primitive, but workable and avoids ZooKeeper and allows all parts of 
Compaction/HouseKeeping to run on multiple MS nodes.

> there should only be 1 Initiator for compactions per Hive installation
> --
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060973#comment-15060973
 ] 

Ashutosh Chauhan commented on HIVE-12688:
-

makes sense +1 for revert. Lets explore alternatives in follow-up.

> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060939#comment-15060939
 ] 

Aihua Xu commented on HIVE-12688:
-

I'm not able to work on that until next week. So please revert it and I will 
try to provide better approach for that.

So agree with your approach. 


Sent from my iPhone



> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12548) Hive metastore goes down in Kerberos,sentry enabled CDH5.5 cluster

2015-12-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12548:

Assignee: (was: Vaibhav Gumashta)

> Hive metastore goes down in Kerberos,sentry enabled CDH5.5 cluster
> --
>
> Key: HIVE-12548
> URL: https://issues.apache.org/jira/browse/HIVE-12548
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
> Environment: RHEL 6.5 CLOUDERA CDH 5.5
>Reporter: narendra reddy ganesana
>
> [pool-3-thread-10]: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
> Invalid status -128
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TTransportException: Invalid status 
> -128
>   at 
> org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
>   at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>   ... 10 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12075) add analyze command to explictly cache file metadata in HBase metastore

2015-12-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060930#comment-15060930
 ] 

Alan Gates commented on HIVE-12075:
---

+1, looks good.

> add analyze command to explictly cache file metadata in HBase metastore
> ---
>
> Key: HIVE-12075
> URL: https://issues.apache.org/jira/browse/HIVE-12075
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12075.01.nogen.patch, HIVE-12075.01.patch, 
> HIVE-12075.02.patch, HIVE-12075.03.patch, HIVE-12075.04.patch, 
> HIVE-12075.nogen.patch, HIVE-12075.patch
>
>
> ANALYZE TABLE (spec as usual) CACHE METADATA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly

2015-12-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12646:

Assignee: (was: Vaibhav Gumashta)

> beeline and HIVE CLI do not parse ; in quote properly
> -
>
> Key: HIVE-12646
> URL: https://issues.apache.org/jira/browse/HIVE-12646
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Clients
>Reporter: Yongzhi Chen
>
> Beeline and Cli have to escape ; in the quote while most other shell scripts 
> need not. For example:
> in Beeline:
> {noformat}
> 0: jdbc:hive2://localhost:1> select ';' from tlb1;
> select ';' from tlb1;
> 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115
> 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403
> Error: Error while compiling statement: FAILED: ParseException line 1:8 
> cannot recognize input near '' '
> {noformat}
> while in mysql shell:
> {noformat}
> mysql> SELECT CONCAT(';', 'foo') FROM test limit 3;
> ++
> | ;foo   |
> | ;foo   |
> | ;foo   |
> ++
> 3 rows in set (0.00 sec)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060922#comment-15060922
 ] 

Thejas M Nair commented on HIVE-12688:
--

None of the failures are related, these are failing in previous test runs as 
well. Looks like we really need some cleanup!
Can someone please review it ?

I think its simpler to commit this small patch and follow up in another jira to 
implement this feature without the regression.


> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-12-16 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11355:
--
Attachment: HIVE-11355.6.patch

> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch, HIVE-11355.4.patch, HIVE-11355.5.patch, HIVE-11355.6.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060896#comment-15060896
 ] 

Hive QA commented on HIVE-12688:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777933/HIVE-12688.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 9947 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6371/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6371/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6371/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777933 - PreCommit-HIVE-TRUNK-Build

> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12675) PerfLogger should log performance metrics at debug level

2015-12-16 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060886#comment-15060886
 ] 

Laljo John Pullokkaran commented on HIVE-12675:
---

+1 conditional on clean QA run & adding documentation on perf logger log level.

> PerfLogger should log performance metrics at debug level
> 
>
> Key: HIVE-12675
> URL: https://issues.apache.org/jira/browse/HIVE-12675
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12675.1.patch
>
>
> As more and more subcomponents of Hive (Tez, Optimizer) etc are using 
> PerfLogger to track the performance metrics, it will be more meaningful to 
> set the PerfLogger logging level to DEBUG. Otherwise, we will print the 
> performance metrics unnecessarily for each and every query if the underlying 
> subcomponent does not control the PerfLogging via a parameter on its own.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12667) Proper fix for HIVE-12473

2015-12-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-12667:
--
Attachment: HIVE-12667.1.patch

re-upload to trigger run

> Proper fix for HIVE-12473
> -
>
> Key: HIVE-12667
> URL: https://issues.apache.org/jira/browse/HIVE-12667
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-12667.1.patch, HIVE-12667.1.patch
>
>
> HIVE-12473 has added an incorrect comment and also lacks a test case.
> Benefits of this fix:
>* Does not say: "Probably doesn't work"
>* Does not use grammar like "subquery columns and such"
>* Adds test cases, that let you verify the fix
>* Doesn't rely on certain structure of key expr, just takes the type at 
> compile time
>* Doesn't require an additional walk of each key expression
>* Shows the type used in explain



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12695) LLAP: use somebody else's cluster

2015-12-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12695:
---

Assignee: Sergey Shelukhin

> LLAP: use somebody else's cluster
> -
>
> Key: HIVE-12695
> URL: https://issues.apache.org/jira/browse/HIVE-12695
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12695.patch
>
>
> For non-HS2 case cluster sharing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12667) Proper fix for HIVE-12473

2015-12-16 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060807#comment-15060807
 ] 

Vikram Dixit K commented on HIVE-12667:
---

I guess the string-string could be special-cased to avoid some unnecessary 
calls. Otherwise the code looks good to me. +1

> Proper fix for HIVE-12473
> -
>
> Key: HIVE-12667
> URL: https://issues.apache.org/jira/browse/HIVE-12667
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-12667.1.patch
>
>
> HIVE-12473 has added an incorrect comment and also lacks a test case.
> Benefits of this fix:
>* Does not say: "Probably doesn't work"
>* Does not use grammar like "subquery columns and such"
>* Adds test cases, that let you verify the fix
>* Doesn't rely on certain structure of key expr, just takes the type at 
> compile time
>* Doesn't require an additional walk of each key expression
>* Shows the type used in explain



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12695) LLAP: use somebody else's cluster

2015-12-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12695:

Attachment: HIVE-12695.patch

[~gopalv] does this make sense?
@user:instance uses that user's instance.

> LLAP: use somebody else's cluster
> -
>
> Key: HIVE-12695
> URL: https://issues.apache.org/jira/browse/HIVE-12695
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HIVE-12695.patch
>
>
> For non-HS2 case cluster sharing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12694) LLAP: Slider destroy semantics require force

2015-12-16 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060761#comment-15060761
 ] 

Vikram Dixit K commented on HIVE-12694:
---

+1 LGTM.

> LLAP: Slider destroy semantics require force
> 
>
> Key: HIVE-12694
> URL: https://issues.apache.org/jira/browse/HIVE-12694
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12694.1.patch
>
>
> {code}
> 2015-12-16 20:10:55,118 [main] ERROR main.ServiceLauncher - Destroy will 
> permanently delete directories and registries. Reissue this command with the 
> --force option if you want to proceed.
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12694) LLAP: Slider destroy semantics require force

2015-12-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12694:
---
Description: 
{code}
2015-12-16 20:10:55,118 [main] ERROR main.ServiceLauncher - Destroy will 
permanently delete directories and registries. Reissue this command with the 
--force option if you want to proceed.
{code}

NO PRECOMMIT TESTS

  was:
{code}
2015-12-16 20:10:55,118 [main] ERROR main.ServiceLauncher - Destroy will 
permanently delete directories and registries. Reissue this command with the 
--force option if you want to proceed.
{code}


> LLAP: Slider destroy semantics require force
> 
>
> Key: HIVE-12694
> URL: https://issues.apache.org/jira/browse/HIVE-12694
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12694.1.patch
>
>
> {code}
> 2015-12-16 20:10:55,118 [main] ERROR main.ServiceLauncher - Destroy will 
> permanently delete directories and registries. Reissue this command with the 
> --force option if you want to proceed.
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12694) LLAP: Slider destroy semantics require force

2015-12-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12694:
---
Attachment: HIVE-12694.1.patch

> LLAP: Slider destroy semantics require force
> 
>
> Key: HIVE-12694
> URL: https://issues.apache.org/jira/browse/HIVE-12694
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12694.1.patch
>
>
> {code}
> 2015-12-16 20:10:55,118 [main] ERROR main.ServiceLauncher - Destroy will 
> permanently delete directories and registries. Reissue this command with the 
> --force option if you want to proceed.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.

2015-12-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12353:
--
Description: 
One of the things that this method does is delete entries from TXN_COMPONENTS 
for partition that it was trying to compact.
This causes Aborted transactions in TXNS to become empty according to
CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be 
deleted.  
Once they are deleted, data that belongs to these txns is deemed committed...

We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) 
states.  We should also not delete then entry from markedCleaned()
We'll have separate process that cleans 'f' and 's' records after X minutes (or 
after > N records for a given partition exist).
This allows SHOW COMPACTIONS to show some history info and how many times 
compaction failed on a given partition (subject to retention interval) so that 
we don't have to call markCleaned() on Compactor failures at the same time 
preventing Compactor to constantly getting stuck on the same bad 
partition/table.

Ideally we'd want to include END_TIME field.


  was:
One of the things that this method does is delete entries from TXN_COMPONENTS 
for partition that it was trying to compact.
This causes Aborted transactions in TXNS to become empty according to
CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be 
delete.  

We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) 
states.  We should also not delete then entry from markedCleaned()
We'll have separate process that cleans 'f' and 's' records after X minutes (or 
after > N records for a given partition exist).
This allows SHOW COMPACTIONS to show some history info and how many times 
compaction failed on a given partition (subject to retention interval) so that 
we don't have to call markCleaned() on Compactor failures at the same time 
preventing Compactor to constantly getting stuck on the same bad 
partition/table.

Ideally we'd want to include END_TIME field.



> When Compactor fails it calls CompactionTxnHandler.markedCleaned().  it 
> should not.
> ---
>
> Key: HIVE-12353
> URL: https://issues.apache.org/jira/browse/HIVE-12353
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> One of the things that this method does is delete entries from TXN_COMPONENTS 
> for partition that it was trying to compact.
> This causes Aborted transactions in TXNS to become empty according to
> CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be 
> deleted.  
> Once they are deleted, data that belongs to these txns is deemed committed...
> We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) 
> states.  We should also not delete then entry from markedCleaned()
> We'll have separate process that cleans 'f' and 's' records after X minutes 
> (or after > N records for a given partition exist).
> This allows SHOW COMPACTIONS to show some history info and how many times 
> compaction failed on a given partition (subject to retention interval) so 
> that we don't have to call markCleaned() on Compactor failures at the same 
> time preventing Compactor to constantly getting stuck on the same bad 
> partition/table.
> Ideally we'd want to include END_TIME field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060683#comment-15060683
 ] 

Thejas M Nair commented on HIVE-12688:
--

I think its better to roll it out and put in a proper fix when its ready. As it 
affects only a small number of lines adding it back with proper fix should be 
straightforward.  Do you agree [~aihuaxu] ?


> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-16 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060603#comment-15060603
 ] 

Pengcheng Xiong commented on HIVE-12663:


[~ekoifman], pushed to branch-2.0 just now. Thanks.

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12663:
---
Fix Version/s: 2.0.0

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060580#comment-15060580
 ] 

Thejas M Nair commented on HIVE-12688:
--

[~sershe] I was thinking it is better to keep the release clear of blockers to 
avoid issues. But we can give couple of days for a better fix if you are OK 
with that (as the release manager for 2.0.0).
It depends on cycles someone has to provide to fix the feature to prevent this 
regression. If we make the change to roll back this feature, there is not too 
much pressure on anyone working on this. 
[~aihuaxu] What do you prefer ? Would you have cycles to fix the regression 
soon ? Or would you prefer adding this feature back again after this patch to 
roll it out (it gives you more time that way)?




> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060581#comment-15060581
 ] 

Eugene Koifman commented on HIVE-12663:
---

[~pxiong] could you commit to 2.0 as well please to maintain parity

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12692) Make use of the Tez HadoopShim in TaskRunner usage

2015-12-16 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12692:
--
Attachment: HIVE-12692.1.txt

> Make use of the Tez HadoopShim in TaskRunner usage
> --
>
> Key: HIVE-12692
> URL: https://issues.apache.org/jira/browse/HIVE-12692
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-12692.1.txt
>
>
> TEZ-2910 adds shims for Hadoop to make use of caller context and other 
> changing hadoop APIs. Hive usage of TezTaskRunner needs to work with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11865) Disable Hive PPD optimizer when CBO has optimized the plan

2015-12-16 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11865:
---
Attachment: HIVE-11865.03.patch

New patch contains the following parts:
- Disabling Hive PPD. It was just necessary to keep a small part of the code 
that is responsible for pushing Filter predicates to TableScan operators 
(SimplePredicatePushDown).
- Disabling Hive inference for _isnotnull_ predicates on equi-join inputs. This 
was done in SemanticAnalyzer, and it is not necessary anymore when we run 
purely through Calcite.
- It introduces a new rule in Calcite that pushes Filter through Sort operator. 
This was present in Hive, but it was missing on the Calcite side.
- It includes logic related to pushing Filter down when return path was on. 
This should have been added when HIVE-0 went it, but it was difficult to 
detect as Hive PPD was doing the work for us.

I already went through the changes in the q files: they are either changes in 
the order of Filter predicate factors, or removal of redundant _isnotnull_ 
factors. I will post the patch to RB for review.

[~jpullokkaran], [~ashutoshc], could you take a look? Thanks

> Disable Hive PPD optimizer when CBO has optimized the plan
> --
>
> Key: HIVE-11865
> URL: https://issues.apache.org/jira/browse/HIVE-11865
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11865.01.patch, HIVE-11865.02.patch, 
> HIVE-11865.02.patch, HIVE-11865.03.patch, HIVE-11865.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060551#comment-15060551
 ] 

Sergey Shelukhin commented on HIVE-12688:
-

I think it makes sense to keep this as blocker. If it in a while the proper fix 
is not in sight we can roll back the regression and postpone a better fix to a 
future version. [~aihuaxu] how hard would it be to make the better fix? 
[~thejas] do you think we should rather roll back now?

> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060525#comment-15060525
 ] 

Gopal V commented on HIVE-12683:


The known 0.4.x OOMs were during split-generation for uncompressed text files 
(HIVE-10746) or when combine inputformat is used on S3 (HADOOP-11584).

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11775) Implement limit push down through union all in CBO

2015-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11775:
---
Attachment: HIVE-11775.10.patch

> Implement limit push down through union all in CBO
> --
>
> Key: HIVE-11775
> URL: https://issues.apache.org/jira/browse/HIVE-11775
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11775.01.patch, HIVE-11775.02.patch, 
> HIVE-11775.03.patch, HIVE-11775.04.patch, HIVE-11775.05.patch, 
> HIVE-11775.06.patch, HIVE-11775.07.patch, HIVE-11775.08.patch, 
> HIVE-11775.09.patch, HIVE-11775.10.patch
>
>
> Enlightened by HIVE-11684 (Kudos to [~jcamachorodriguez]), we can actually 
> push limit down through union all, which reduces the intermediate number of 
> rows in union branches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread rohit garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060484#comment-15060484
 ] 

rohit garg commented on HIVE-12683:
---

Thanks Hitesh for the inputs. I am going to try the recommended settings and 
update the results here. The AM size was 1024MB. I was just using the bootstrap 
script provided by Amazon for initial testing. They had an older version of Tez 
(I think 0.4 or 0.5).


> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060465#comment-15060465
 ] 

Hitesh Shah commented on HIVE-12683:


The Tez AM resource sizing has no relation to the task container sizing. That 
said, for various benchmarks done in the past, I dont believe anyone has needed 
to go beyond 16GB for the Tez AM for very large DAGs.

[~rohitgarg1989] What was the AM size configured to when the OOM happened? If 
you are running a version older than Tez 0.7.0, there were some memory issues 
that require a large AM size i.e. large being say 16 GB but for 0.7.0 and 
higher, even 4 GB should be sufficient for a decent sized DAG. You can set it 
to 8 GB to be safe for now with Xmx say 6.4 GB and that should be sufficient. 
If you still hit an OOM with 8 GB, a jira against Tez with the heap dump would 
be helpful. 

[~gopalv] anything to add? any configs that need to be tuned / turned off for 
Hive that ends up using more memory in the AM? Any implicit caching of splits, 
etc?  

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060409#comment-15060409
 ] 

Thejas M Nair commented on HIVE-12688:
--

Yeah, looks like CDH version of Hive has been using this property to restrict 
access. This is not old behavior of Apache Hive. This is a new feature is not a 
pattern commonly seen in hadoop ecosystem. In case of HDFS, for example access 
is restricted on file permissions and not on a user group setting. To secure 
metastore access, you can already use storage based authorization.


I am fine with this feature being added. However, the way it is implemented 
right now breaks hive not work if hadoop.proxyuser.hive.hosts is properly set.  
I am not sure why CDH users didn't face this issue, I assume cloudera manager 
might not be securing this for the clusters.
I don't think we can ship Hive 2.0.0 in this form as it is a major regression. 
If you can change the implementation to fix this issue, please create a follow 
up jira with patch. I created this patch to rollback the change so that we 
don't block 2.0.0 release.





> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11927) Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants

2015-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11927:
---
Attachment: (was: HIVE-11927.13.patch)

> Implement/Enable constant related optimization rules in Calcite: enable 
> HiveReduceExpressionsRule to fold constants
> ---
>
> Key: HIVE-11927
> URL: https://issues.apache.org/jira/browse/HIVE-11927
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11927.01.patch, HIVE-11927.02.patch, 
> HIVE-11927.03.patch, HIVE-11927.04.patch, HIVE-11927.05.patch, 
> HIVE-11927.06.patch, HIVE-11927.07.patch, HIVE-11927.08.patch, 
> HIVE-11927.09.patch, HIVE-11927.10.patch, HIVE-11927.11.patch, 
> HIVE-11927.12.patch, HIVE-11927.13.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11927) Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants

2015-12-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11927:
---
Attachment: HIVE-11927.13.patch

> Implement/Enable constant related optimization rules in Calcite: enable 
> HiveReduceExpressionsRule to fold constants
> ---
>
> Key: HIVE-11927
> URL: https://issues.apache.org/jira/browse/HIVE-11927
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11927.01.patch, HIVE-11927.02.patch, 
> HIVE-11927.03.patch, HIVE-11927.04.patch, HIVE-11927.05.patch, 
> HIVE-11927.06.patch, HIVE-11927.07.patch, HIVE-11927.08.patch, 
> HIVE-11927.09.patch, HIVE-11927.10.patch, HIVE-11927.11.patch, 
> HIVE-11927.12.patch, HIVE-11927.13.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread rohit garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060337#comment-15060337
 ] 

rohit garg commented on HIVE-12683:
---

yeah...it was application master..we read somewhere Tez AM memory and Xmx 
settings should be same as tez container. So, in one of our tests and which ran 
(as mentioned in the blog), we did

set tez.am.resource.memory.mb=59205;
set tez.am.launch.cmd-opts =-Xmx47364m;

We were tweaking the following properties mainly :

set tez.task.resource.memory.mb
set tez.am.resource.memory.mb
set tez.am.launch.cmd-opts 
set hive.tez.container.size
set hive.tez.java.opts
set tez.am.grouping.max-size

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060302#comment-15060302
 ] 

Hitesh Shah commented on HIVE-12683:


It seems like the application master is running out of memory. What is the Tez 
AM being configured for in terms of memory and Xmx?

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12691) Compute stats on hbase tables causes Zookeeper connection leaks.

2015-12-16 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060270#comment-15060270
 ] 

Naveen Gangam commented on HIVE-12691:
--


I was referrring to these classes in the above comment
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java

> Compute stats on hbase tables causes Zookeeper connection leaks.
> 
>
> Key: HIVE-12691
> URL: https://issues.apache.org/jira/browse/HIVE-12691
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Fix For: 1.3.0
>
>
> hive.stats.autogather defaults to true in newer hive releases which causes 
> stats to be collected on hbase-backed hive tables.
> Using HTable APIs causes a new zookeeper connections to be created. So if 
> HTable.close() is not called, the underlying ZK connection remains open as in 
> HIVE-12250.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12691) Compute stats on hbase tables causes Zookeeper connection leaks.

2015-12-16 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-12691.
--
   Resolution: Not A Problem
Fix Version/s: 1.3.0

These classes have been deleted from Hive 1.3 and Hive 2.0 via HIVE-12005. So 
this is no longer a issue in the development releases, just the older releases 
have this issue. Closing the jira as no fixes are needed.

> Compute stats on hbase tables causes Zookeeper connection leaks.
> 
>
> Key: HIVE-12691
> URL: https://issues.apache.org/jira/browse/HIVE-12691
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Fix For: 1.3.0
>
>
> hive.stats.autogather defaults to true in newer hive releases which causes 
> stats to be collected on hbase-backed hive tables.
> Using HTable APIs causes a new zookeeper connections to be created. So if 
> HTable.close() is not called, the underlying ZK connection remains open as in 
> HIVE-12250.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-16 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060190#comment-15060190
 ] 

Xiaowei Wang commented on HIVE-12541:
-

symlink_text_input_format test case have been update within  2.1.0 version . 
There is still other test case failed . Seems it does not matter with my patch.

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch, HIVE-12541.4.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11828) beeline -f fails on scripts with tabs between column type and comment

2015-12-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060183#comment-15060183
 ] 

Andrés Cordero commented on HIVE-11828:
---

Same problem between column name and column type.

> beeline -f fails on scripts with tabs between column type and comment
> -
>
> Key: HIVE-11828
> URL: https://issues.apache.org/jira/browse/HIVE-11828
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.0
>Reporter: Krzysztof Adamski
>Priority: Minor
>
> This issue was supposed to be resolved by 
> https://issues.apache.org/jira/browse/HIVE-6359
> However when invoking
>create table test (id intCOMMENT 'test');
> the following error appears
>  beeline -f test.sql 
> -u"jdbc:hive2://localhost:1/default;principal=hive/FQDN@US-WEST-2.COMPUTE.INTERNAL"
> scan complete in 4ms
> Connecting to 
> jdbc:hive2://localhost:1/default;principal=hiveFQDN@US-WEST-2.COMPUTE.INTERNAL
> Connected to: Apache Hive (version 1.1.0-cdh5.4.4)
> Driver: Hive JDBC (version 1.1.0-cdh5.4.4)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://localhost:1/default> create table test (id intCOMMENT 
> 'test');
> Error: Error while compiling statement: FAILED: ParseException line 1:22 
> cannot recognize input near 'intCOMMENT' ''test'' ')' in column type 
> (state=42000,code=4)
> There is no problem when  is between the columns e.g. 
>   create table test (id int COMMENT 'test',id2 string COMMENT 
> 'test2');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread rohit garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060179#comment-15060179
 ] 

rohit garg commented on HIVE-12683:
---

I will update here with the results.

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread rohit garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060175#comment-15060175
 ] 

rohit garg commented on HIVE-12683:
---

Thanks for your inputs. I will try these changes and see if that would give me 
any performance boost over hive query engine.

This was the OOM error I was getting before I tweaked memory settings :

0 FATAL [Socket Reader #1 for port 55739] 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Socket 
Reader #1 for port 55739,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1510)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
2015-12-07 20:31:32,859 FATAL [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
2015-12-07 20:31:30,590 WARN [IPC Server handler 0 on 55739] 
org.apache.hadoop.ipc.Server: IPC Server handler 0 on 55739, call heartbeat({  
containerId=container_1449516549171_0001_01_000100, requestId=10184, 
startIndex=0, maxEventsToGet=0, taskAttemptId=null, eventCount=0 }), rpc 
version=2, client version=19, methodsFingerPrint=557389974 from 
10.10.30.35:47028 Call#11165 Retry#0: error: java.lang.OutOfMemoryError: GC 
overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
javax.security.auth.SubjectDomainCombiner.optimize(SubjectDomainCombiner.java:464)
at 
javax.security.auth.SubjectDomainCombiner.combine(SubjectDomainCombiner.java:267)
at 
java.security.AccessControlContext.goCombiner(AccessControlContext.java:499)
at 
java.security.AccessControlContext.optimize(AccessControlContext.java:407)
at java.security.AccessController.getContext(AccessController.java:501)
at javax.security.auth.Subject.doAs(Subject.java:412)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
2015-12-07 20:32:53,495 INFO [Thread-60] amazon.emr.metrics.MetricsSaver: Saved 
4:3 records to /mnt/var/em/raw/i-782f08c8_20151207_7921_07921_raw.bin
2015-12-07 20:32:53,495 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
2015-12-07 20:32:50,435 INFO [IPC Server handler 20 on 55739] 
org.apache.hadoop.ipc.Server: IPC Server handler 20 on 55739, call 
getTask(org.apache.tez.common.ContainerContext@409a6aa9), rpc version=2, client 
version=19, methodsFingerPrint=557389974 from 10.10.30.33:33644 Call#11094 
Retry#0: error: java.io.IOException: java.lang.OutOfMemoryError: GC overhead 
limit exceeded
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
2015-12-07 20:32:29,117 WARN [IPC Server handler 23 on 55739] 
org.apache.hadoop.ipc.Server: IPC Server handler 23 on 55739, call 
getTask(org.apache.tez.common.ContainerContext@7c7e6992), rpc version=2, client 
version=19, methodsFingerPrint=557389974 from 10.10.30.38:44218 Call#11260 
Retry#0: error: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
2015-12-07 20:32:53,497 INFO [Thread-60] amazon.emr.metrics.MetricsSaver: Saved 
1:1 records to /mnt/var/em/raw/i-782f08c8_20151207_7921_07921_raw.bin
2015-12-07 20:32:53,498 INFO [Thread-61] amazon.emr.metrics.MetricsSaver: Saved 
1:1 records to /mnt/var/em/raw/i-782f08c8_20151207_7921_07921_raw.bin
2015-12-07 20:32:53,498 INFO [Thread-2] org.apache.tez.dag.app.DAGAppMaster: 
DAGAppMaster received a signal. Signaling TaskScheduler
2015-12-07 20:32:53,498 INFO [Thread-2] 
org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified 
that iSignalled was : true
2015-12-07 20:32:53,499 INFO [Thread-2] 
org.apache.tez.dag.history.HistoryEventHandler: Stopping HistoryEventHandler
2015-12-07 20:32:53,499 INFO [Thread-2] 
org.apache.tez.dag.history.recovery.RecoveryService: Stopping RecoveryService
2015-12-07 20:32:53,499 INFO [Thread-2] 
org.apache.tez.dag.history.recovery.RecoveryService: Closing Summary Stream
2015-12-07 20:32:53,499 INFO [LeaseRenewer:hadoop@10.10.30.148:9000] 
org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We h

[jira] [Commented] (HIVE-11487) Add getNumPartitionsByFilter api in metastore api

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060121#comment-15060121
 ] 

Hive QA commented on HIVE-11487:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777985/HIVE-11487.03.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6370/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6370/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6370/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-common ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-common ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.1.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-common ---
[INFO] 
[INFO] --- maven-jar-plugin:2.2:test-jar (default) @ hive-common ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.1.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-common ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.1.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-common/2.1.0-SNAPSHOT/hive-common-2.1.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/common/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-common/2.1.0-SNAPSHOT/hive-common-2.1.0-SNAPSHOT.pom
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.1.0-SNAPSHOT-tests.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-common/2.1.0-SNAPSHOT/hive-common-2.1.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] 
[INFO] Building Hive Serde 2.1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-serde ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/serde/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/serde 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-serde ---
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-serde 
---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/serde/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/serde/src/gen/thrift/gen-javabean
 added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-serde ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/serde/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-serde ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-serde ---
[INFO] Compiling 406 source files to 
/data/hive-ptest/working/apache-github-source-source/serde/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/SerDe.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/SerDe.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/AbstractPrimitiveLazyObjectInspector.java:
 Some input files use unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/AbstractPrimitiveLazyObjectInspector.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-pl

[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060086#comment-15060086
 ] 

Hive QA commented on HIVE-12541:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777960/HIVE-12541.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 9948 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6368/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6368/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6368/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777960 - PreCommit-HIVE-TRUNK-Build

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch, HIVE-12541.4.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059945#comment-15059945
 ] 

Aihua Xu commented on HIVE-12688:
-

You are right. This is hadoop property. Seems that property should not limit 
the access to metastore server and we were historically using that to limit 
access to hive.

In our own version, somehow we documented as such. Is this the old behavior? 
Instead of changing it back since it will introduce the issue that it will not 
block unauthorized access to Hive, can we keep this behavior and try to rework 
with a correct way that we block the Hive access somewhere else in a later 
version? 


> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >