[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523316#comment-16523316
 ] 

Zoltan Haindrich commented on HIVE-19943:
-

It seems to me that somehow the reader somehow picks it up...I guess that some 
optimization is replacing the reader with a more specialized one - but that one 
doesn't handle the header skip
disabling llap or vectorization restores good behaviour for me.
{{set hive.vectorized.execution.enabled=false}}

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --sap_0bill_typea--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523317#comment-16523317
 ] 

Liam De Lee commented on HIVE-19943:


Yep, once you do some sort of sorting, or change the natural order it is not 
working as intended and you get the header back. Also if you do a count(1) you 
also get the header in the count.

For me it looks like that he first runs the query and then looks if a header is 
present or not. BUT the funny part is that if the header is in the data set he 
does not remove the top row so we do not lose a random row but we do see the 
header which is totally crazy.

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --sap_0bill_typea--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liam De Lee updated HIVE-19943:
---
Comment: was deleted

(was: Yep, once you do some sort of sorting, or change the natural order it is 
not working as intended and you get the header back. Also if you do a count(1) 
you also get the header in the count.

For me it looks like that he first runs the query and then looks if a header is 
present or not. BUT the funny part is that if the header is in the data set he 
does not remove the top row so we do not lose a random row but we do see the 
header which is totally crazy.)

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --sap_0bill_typea--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523320#comment-16523320
 ] 

Liam De Lee commented on HIVE-19943:


okay, can i ask what what does this do exactly? just to know what impact it can 
have for the project if we start working with big datasets.

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --sap_0bill_typea--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liam De Lee updated HIVE-19943:
---
Description: 
We are using the tblproperties ("skip.header.line.count"="1") when creating an 
external table.

When we do a select * from table we get it back as expected without the header 
present in the result set.

However when we do for instance a count(1) we get the header back in this count 
(tested with a select * from table and paste it in notepad to find the amount 
of rows)

If we also do this with a select distinct(column) from table we also get the 
header as a distinct value.

file structure:
||_TESTING_TYPE||
|adf|
|hyg|
|abc|

 

*Update: 26/06/2018*

Create statement:
{code:java}
---
--test_type--
---
CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
  (
test_type  string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
STORED AS TEXTFILE
LOCATION 'adl://{adlslocation}data/data2/test'
tblproperties ("skip.header.line.count"="1")
{code}
 Select statement:
{code:java}
select * from test_type_in;
{code}
Distinct statement:
{code:java}
select distinct test_type from test_type_in ORDER BY test_type;
{code}
I cannot show the exact statement because of NDA so i changed those values to 
test.

 

I can also tell you it is not just at our HDInsight but also at another company 
we are working for. It does not Mather what is in the data as well. so for 
testing purposes:
{code:java}
test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}

  was:
We are using the tblproperties ("skip.header.line.count"="1") when creating an 
external table.

When we do a select * from table we get it back as expected without the header 
present in the result set.

However when we do for instance a count(1) we get the header back in this count 
(tested with a select * from table and paste it in notepad to find the amount 
of rows)

If we also do this with a select distinct(column) from table we also get the 
header as a distinct value.

file structure:
||_TESTING_TYPE||
|adf|
|hyg|
|abc|

 

*Update: 26/06/2018*

Create statement:
{code:java}
---
--sap_0bill_typea--
---
CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
  (
test_type  string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
STORED AS TEXTFILE
LOCATION 'adl://{adlslocation}data/data2/test'
tblproperties ("skip.header.line.count"="1")
{code}
 Select statement:
{code:java}
select * from test_type_in;
{code}
Distinct statement:
{code:java}
select distinct test_type from test_type_in ORDER BY test_type;
{code}
I cannot show the exact statement because of NDA so i changed those values to 
test.

 

I can also tell you it is not just at our HDInsight but also at another company 
we are working for. It does not Mather what is in the data as well. so for 
testing purposes:
{code:java}
test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}


> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at o

[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523335#comment-16523335
 ] 

Liam De Lee commented on HIVE-19943:


I just tested your solution and it seems to work.

However we need this enabled for Interactive query and to run faster queries on 
bigger data sets so this is not a real solution for us at this moment.

For now we are going to remove the headers but this loses readability if 
somebody gets the file and looks into it without a header. Is there maybe 
another solution as well?

Thanks in advance

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523340#comment-16523340
 ] 

Zoltan Haindrich commented on HIVE-19943:
-

it will be slower; I'm not sure how much...
I think it would be good to:

* disable vectorization
* create a new orc table from the input
* enable vectorization

and use the orc table version after that...


> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19888) Misleading "METASTORE_FILTER_HOOK will be ignored" warning from SessionState

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523347#comment-16523347
 ] 

Zoltan Haindrich commented on HIVE-19888:
-

[~vanzin] not really...I wanted to commit it; but if you don't mind I would 
like to have your email address (#$% GDPR) to attribute the contribution to you.

> Misleading "METASTORE_FILTER_HOOK will be ignored" warning from SessionState
> 
>
> Key: HIVE-19888
> URL: https://issues.apache.org/jira/browse/HIVE-19888
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Attachments: HIVE-19888.1.patch
>
>
> When I run things on my test cluster I see things like this in my logs:
> {noformat}
> 18/03/14 13:35:20 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
> ignored, since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 18/03/14 13:35:21 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
> ignored, since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> {noformat}
> That's because the code in SessionState.java is wrong:
> {code}
> String metastoreHook = 
> sessionConf.get(ConfVars.METASTORE_FILTER_HOOK.name());
> if 
> (!ConfVars.METASTORE_FILTER_HOOK.getDefaultValue().equals(metastoreHook) &&
> 
> !AuthorizationMetaStoreFilterHook.class.getName().equals(metastoreHook)) {
>   LOG.warn(ConfVars.METASTORE_FILTER_HOOK.name() +
>   " will be ignored, since hive.security.authorization.manager" +
>   " is set to instance of HiveAuthorizerFactory.");
> }
> {code}
> It's using {{.name()}} which is the enum name, not the actual config key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19886) Logs may be directed to 2 files if --hiveconf hive.log.file is used

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523354#comment-16523354
 ] 

Zoltan Haindrich commented on HIVE-19886:
-

I don't think printing a log messages will "solve" this problem...I think this 
is a misconfiguration issue could be detected during startup
logging to 2 places can be very confusing - if logging to 2 places can't be 
avoided programatically; I think a hard exception should abort the startup - 
and let the user fix the setup...

> Logs may be directed to 2 files if --hiveconf hive.log.file is used
> ---
>
> Key: HIVE-19886
> URL: https://issues.apache.org/jira/browse/HIVE-19886
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Jaume M
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19886.2.patch, HIVE-19886.2.patch, HIVE-19886.patch
>
>
> hive launch script explicitly specific log4j2 configuration file to use. The 
> main() methods in HiveServer2 and HiveMetastore reconfigures the logger based 
> on user input via --hiveconf hive.log.file. This may cause logs to end up in 
> 2 different files. Initial logs goes to the file specified in 
> hive-log4j2.properties and after logger reconfiguration the rest of the logs 
> goes to the file specified via --hiveconf hive.log.file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19649) Clean up inputs in JDBC PreparedStatement. Add unit tests.

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523366#comment-16523366
 ] 

Zoltan Haindrich commented on HIVE-19649:
-

seems like the reformatted one have bumoed into infra issues :D
since it contains only ws changes; I wanted to commit it; [~mkysliuk]  may I 
ask your email address, to attribute this patch to you? (INFRA-16628) 

> Clean up inputs in JDBC PreparedStatement. Add unit tests.
> --
>
> Key: HIVE-19649
> URL: https://issues.apache.org/jira/browse/HIVE-19649
> Project: Hive
>  Issue Type: Test
>Reporter: Mykhailo Kysliuk
>Assignee: Mykhailo Kysliuk
>Priority: Minor
> Attachments: HIVE-19649.01.patch, HIVE-19649.02.patch, 
> HIVE-19649.03.patch
>
>
> Add unit tests for feature that was implemented in 
> [HIVE-18788|https://issues.apache.org/jira/browse/HIVE-18788].
> The integration tests are present, but it will be useful to catch errors 
> during module build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19938) Upgrade scripts for information schema

2018-06-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-19938:
--
Attachment: HIVE-19938.4.patch

> Upgrade scripts for information schema
> --
>
> Key: HIVE-19938
> URL: https://issues.apache.org/jira/browse/HIVE-19938
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-19938.1.patch, HIVE-19938.2.patch, 
> HIVE-19938.3.patch, HIVE-19938.4.patch
>
>
> To make schematool -upgradeSchema work for information schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19938) Upgrade scripts for information schema

2018-06-26 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523367#comment-16523367
 ] 

Daniel Dai commented on HIVE-19938:
---

Thejas point out we need to manage hive schema separately. Otherwise db upgrade 
will update version thus hive schema upgrade will not run. So change the script 
to load data into version to avoid tez job. Also improve embedded hs2 in the 
patch to turn off acid and metastore cache.

> Upgrade scripts for information schema
> --
>
> Key: HIVE-19938
> URL: https://issues.apache.org/jira/browse/HIVE-19938
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-19938.1.patch, HIVE-19938.2.patch, 
> HIVE-19938.3.patch, HIVE-19938.4.patch
>
>
> To make schematool -upgradeSchema work for information schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19990) Query with interval literal in join condition fails

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523371#comment-16523371
 ] 

Zoltan Haindrich commented on HIVE-19990:
-

+1 pending tests

> Query with interval literal in join condition fails
> ---
>
> Key: HIVE-19990
> URL: https://issues.apache.org/jira/browse/HIVE-19990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-19990.1.patch
>
>
> *Reproducer*
> {code:sql}
> > create table date_dim_d1(
>   d_week_seqint,
>   d_datestring);
> > SELECT 
>d1.d_week_seq
> FROM   
>date_dim_d1 d1 
>JOIN date_dim_d1 d3 
> WHERE  
>Cast(d3.d_date AS date) > Cast(d1.d_date AS date) + INTERVAL '5' day ;
> {code}
> *Exception*
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: '5 00:00:00.0' 
> encountered with 0 children
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondPopulateAlias(SemanticAnalyzer.java:2780)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondPopulateAlias(SemanticAnalyzer.java:2775)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondition(SemanticAnalyzer.java:3060)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondition(SemanticAnalyzer.java:2959)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinTree(SemanticAnalyzer.java:9633)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11380)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11285)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12071)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:593)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12150)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:658)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1829)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1776)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1771)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:832)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:770)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:694)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19951) Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution)

2018-06-26 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523375#comment-16523375
 ] 

Matt McCline commented on HIVE-19951:
-

Patch #6 failed in Hive QA in less than 1 minute.  Infrastructure issue I 
assume.

> Vectorization: Need to disable encoded LLAP I/O for ORC when there is data 
> type conversion  (Schema Evolution)
> --
>
> Key: HIVE-19951
> URL: https://issues.apache.org/jira/browse/HIVE-19951
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-19951.01.patch, HIVE-19951.02.patch, 
> HIVE-19951.03.patch, HIVE-19951.04.patch, HIVE-19951.05.patch, 
> HIVE-19951.06.patch
>
>
> Currently, reading encoded ORC data does not support data type conversion.  
> So, encoded reading and cache populating needs to be disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523378#comment-16523378
 ] 

Liam De Lee commented on HIVE-19943:


we are already working on orc tables with gzip compressed data.

I tried what you suggested but i still get the header back.

Can this be a problem by HDInsight that they need to fix? If this is the case i 
can raise a ticket by them to look into this because it is kind of a basic  
option in my opinion.

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523378#comment-16523378
 ] 

Liam De Lee edited comment on HIVE-19943 at 6/26/18 8:11 AM:
-

we are already working on orc tables with gzip compressed data.

I tried what you suggested but i still get the header back. Also we are using 
external tables and those cannot be put into orc because we get the data into 
csv files. And we also have the same problem there.

Can this be a problem by HDInsight that they need to fix? If this is the case i 
can raise a ticket by them to look into this because it is kind of a basic  
option in my opinion.


was (Author: liam de lee):
we are already working on orc tables with gzip compressed data.

I tried what you suggested but i still get the header back.

Can this be a problem by HDInsight that they need to fix? If this is the case i 
can raise a ticket by them to look into this because it is kind of a basic  
option in my opinion.

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19946) VectorizedRowBatchCtx.recordIdColumnVector cannot be shared between different JVMs

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19946:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

pushed to master. Thank you Teddy for fixing this!

> VectorizedRowBatchCtx.recordIdColumnVector cannot be shared between different 
> JVMs
> --
>
> Key: HIVE-19946
> URL: https://issues.apache.org/jira/browse/HIVE-19946
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19946.1.patch, HIVE-19946.2.patch
>
>
> VectorizedRowBatchCtx.recordIdColumnVector was used temporarily to pass 
> record id column, which is virtual, between a reducer and a mapper. However, 
> when the reducer and the mapper are not in a same JVM, it makes incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19046) Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-06-26 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523394#comment-16523394
 ] 

Peter Vary commented on HIVE-19046:
---

+1

> Refactor the common parts of the HiveMetastore add_partition_core and 
> add_partitions_pspec_core methods
> ---
>
> Key: HIVE-19046
> URL: https://issues.apache.org/jira/browse/HIVE-19046
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Attachments: HIVE-19046.1.patch, HIVE-19046.2.patch, 
> HIVE-19046.3.patch, HIVE-19046.4.patch, HIVE-19046.5.patch, 
> HIVE-19046.6.patch, HIVE-19046.7.patch
>
>
> This is a follow-up Jira of the 
> [HIVE-18696|https://issues.apache.org/jira/browse/HIVE-18696] 
> [review|https://reviews.apache.org/r/65716/].
> The biggest part of these methods use the same code. It would make sense to 
> move this code part to a common method.
> This code is almost the same in the two methods:
> {code}
> List> partFutures = Lists.newArrayList();
> final Table table = tbl;
> for (final Partition part : parts) {
>   if (!part.getTableName().equals(tblName) || 
> !part.getDbName().equals(dbName)) {
> throw new MetaException("Partition does not belong to target 
> table "
> + dbName + "." + tblName + ": " + part);
>   }
>   boolean shouldAdd = startAddPartition(ms, part, ifNotExists);
>   if (!shouldAdd) {
> existingParts.add(part);
> LOG.info("Not adding partition " + part + " as it already 
> exists");
> continue;
>   }
>   final UserGroupInformation ugi;
>   try {
> ugi = UserGroupInformation.getCurrentUser();
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
>   partFutures.add(threadPool.submit(new Callable() {
> @Override
> public Partition call() throws Exception {
>   ugi.doAs(new PrivilegedExceptionAction() {
> @Override
> public Object run() throws Exception {
>   try {
> boolean madeDir = createLocationForAddedPartition(table, 
> part);
> if (addedPartitions.put(new PartValEqWrapper(part), 
> madeDir) != null) {
>   // Technically, for ifNotExists case, we could insert 
> one and discard the other
>   // because the first one now "exists", but it seems 
> better to report the problem
>   // upstream as such a command doesn't make sense.
>   throw new MetaException("Duplicate partitions in the 
> list: " + part);
> }
> initializeAddedPartition(table, part, madeDir);
>   } catch (MetaException e) {
> throw new IOException(e.getMessage(), e);
>   }
>   return null;
> }
>   });
>   return part;
> }
>   }));
> }
> try {
>   for (Future partFuture : partFutures) {
> Partition part = partFuture.get();
> if (part != null) {
>   newParts.add(part);
> }
>   }
> } catch (InterruptedException | ExecutionException e) {
>   // cancel other tasks
>   for (Future partFuture : partFutures) {
> partFuture.cancel(true);
>   }
>   throw new MetaException(e.getMessage());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19673) qtest: parquet_ctas.q breaks sample6.q

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-19673.
-
Resolution: Fixed

fixed by HIVE-19882

> qtest: parquet_ctas.q breaks sample6.q
> --
>
> Key: HIVE-19673
> URL: https://issues.apache.org/jira/browse/HIVE-19673
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there is a strange diff happening currently from time to time...sample6 is 
> failed
> running it on its own is fine...however when parquet_ctas.q runs before it; 
> the *result set* is different
> {code}
> time mvn install -pl itests/qtest -DskipSparkTests -Pitests 
> -Dtest=TestCliDriver -Dqfile=parquet_ctas.q,sample6.q  -Dtest.output.overwrite
> {code}
> note: sample6.q is run also via the spark driver; and the intresting is that 
> the "fluctuating" new resultset matches with the spark driver's output
> {code}
> diff -Naur ./ql/src/test/results/clientpositive/sample6.q.out 
> ./ql/src/test/results/clientpositive/spark/sample6.q.out | grep val_
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19888) Misleading "METASTORE_FILTER_HOOK will be ignored" warning from SessionState

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19888:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

found it... :D
pushed to master. Thank you [~vanzin]!

> Misleading "METASTORE_FILTER_HOOK will be ignored" warning from SessionState
> 
>
> Key: HIVE-19888
> URL: https://issues.apache.org/jira/browse/HIVE-19888
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-19888.1.patch
>
>
> When I run things on my test cluster I see things like this in my logs:
> {noformat}
> 18/03/14 13:35:20 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
> ignored, since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 18/03/14 13:35:21 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
> ignored, since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> {noformat}
> That's because the code in SessionState.java is wrong:
> {code}
> String metastoreHook = 
> sessionConf.get(ConfVars.METASTORE_FILTER_HOOK.name());
> if 
> (!ConfVars.METASTORE_FILTER_HOOK.getDefaultValue().equals(metastoreHook) &&
> 
> !AuthorizationMetaStoreFilterHook.class.getName().equals(metastoreHook)) {
>   LOG.warn(ConfVars.METASTORE_FILTER_HOOK.name() +
>   " will be ignored, since hive.security.authorization.manager" +
>   " is set to instance of HiveAuthorizerFactory.");
> }
> {code}
> It's using {{.name()}} which is the enum name, not the actual config key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19554) Enable TestDanglingQOuts#checkDanglingQOut

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19554:
---

Assignee: Zoltan Haindrich

> Enable TestDanglingQOuts#checkDanglingQOut
> --
>
> Key: HIVE-19554
> URL: https://issues.apache.org/jira/browse/HIVE-19554
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Zoltan Haindrich
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523415#comment-16523415
 ] 

Zoltan Haindrich commented on HIVE-19943:
-

I think your input table is in text form because {{STORED AS TEXTFILE}} is not 
an orc table...

Altought it's not specific to hdi...I think reporting to hdi might probably 
also help - as you will need this fix on the hdi hive to be usable for you...


> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523425#comment-16523425
 ] 

Liam De Lee commented on HIVE-19943:


That is correct but that is for the external table only because we cannot do it 
in orc because we import data from SAP to ADLS (Azure data lake store) and this 
is in a csv file.

Okay will update my question there with your remarks if that is okay for you? 
(will link to this ticket).

Is it then a problem with Hive itself that needs to be fixed? Because i guesse 
this is no expected behavior right?

 

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523441#comment-16523441
 ] 

Zoltan Haindrich commented on HIVE-19943:
-

I might have misunderstood...but in that case I think you may only need to 
disable vectorization while you are loading the data from that external table 
into your orc tables :)

I think referencing this ticket might help.

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19968) UDF exception is not throw out

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19968:

Description: 
udf init failed, and throw a exception, but hive catch it and do nothing, 
leading to app succ, but no data is generated.

{code}
GenericUDFReflect.java#evaluate()

try {  

   o = null;  

   o = ReflectionUtils.newInstance(c, null);

}   catch (Exception e) {  

// ignored

}
{code}

  was:
udf init failed, and throw a exception, but hive catch it and do nothing, 
leading to app succ, but no data is generated.

GenericUDFReflect.java#evaluate()

try {  

   o = null;  

   o = ReflectionUtils.newInstance(c, null);

}   catch (Exception e) {  

// ignored

}


> UDF exception is not throw out
> --
>
> Key: HIVE-19968
> URL: https://issues.apache.org/jira/browse/HIVE-19968
> Project: Hive
>  Issue Type: Bug
>Reporter: sandflee
>Priority: Major
> Attachments: hive-udf.png
>
>
> udf init failed, and throw a exception, but hive catch it and do nothing, 
> leading to app succ, but no data is generated.
> {code}
> GenericUDFReflect.java#evaluate()
> try {  
>    o = null;  
>    o = ReflectionUtils.newInstance(c, null);
> }   catch (Exception e) {  
> // ignored
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523456#comment-16523456
 ] 

Liam De Lee commented on HIVE-19943:


no problem, However we will run this job (moving from external table to hive 
table) every day and we have now around 218 tables (still counting) So for us 
this does not seem like a very valid solution.

Is there any way this could be reported to Hive to get fixed? Because to me it 
looks like a basic question.

I linked the ticket and explained what the problem is. I hope to hear from them 
soon.

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19763) Prevent execution of very large queries

2018-06-26 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523458#comment-16523458
 ] 

Peter Vary commented on HIVE-19763:
---

[~sershe]: The above mentioned WM is the one under this umbrella jira: 
HIVE-17481? Am I right, when I think that WM helps when we have an already 
compiled query and we want them to gracefully handle cluster resources when 
executing the given query (and other queries in parallel)?
 I think [~lmarti...@cloudera.com]'s problem is more related to HS2 memory 
management, where compiling and optimizing the query causes OOM on HS2.

> Prevent execution of very large queries
> ---
>
> Key: HIVE-19763
> URL: https://issues.apache.org/jira/browse/HIVE-19763
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 2.3.2
>Reporter: Luis E Martinez-Poblete
>Priority: Minor
>
> Synopsis:
> =
> Prevent execution of very large queries.
>  
> Feature Request:
> 
> Please enhance Hive with a parameter to restrict the execution of very large 
> queries.
> Use case: User is trying to create a view with a size of 8 MB. Creation of 
> this view was possible after increasing heap memory in several components 
> (HMS, HS2, Zookeeper). However, this view caused major issues when it was 
> used in a CTE query which resulted in GC pauses and eventually OOM of the HS2 
> process.
>  
> Although, it is possible to create the view, it may cause other issues when 
> used in queries. From the Hadoop administrator point of view, it would be 
> good to restrict this type of queries.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19649) Clean up inputs in JDBC PreparedStatement. Add unit tests.

2018-06-26 Thread Mykhailo Kysliuk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523473#comment-16523473
 ] 

Mykhailo Kysliuk commented on HIVE-19649:
-

misha.kysl...@gmail.com
The last patch has code formatted in the right way.
Thanks for the reviewing)

> Clean up inputs in JDBC PreparedStatement. Add unit tests.
> --
>
> Key: HIVE-19649
> URL: https://issues.apache.org/jira/browse/HIVE-19649
> Project: Hive
>  Issue Type: Test
>Reporter: Mykhailo Kysliuk
>Assignee: Mykhailo Kysliuk
>Priority: Minor
> Attachments: HIVE-19649.01.patch, HIVE-19649.02.patch, 
> HIVE-19649.03.patch
>
>
> Add unit tests for feature that was implemented in 
> [HIVE-18788|https://issues.apache.org/jira/browse/HIVE-18788].
> The integration tests are present, but it will be useful to catch errors 
> during module build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523488#comment-16523488
 ] 

Zoltan Haindrich commented on HIVE-19943:
-

{quote}Is there any way this could be reported to Hive to get fixed?{quote}

This ticket is exactly that :P 

I think if you could live with it: right now it would be the best option to 
disable vectorization - and when the problem is better known there might be 
other options to workaround the issue - or there might be a fix...

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19943) Header values keep showing up in result sets

2018-06-26 Thread Liam De Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523490#comment-16523490
 ] 

Liam De Lee commented on HIVE-19943:


okay guesse we will wait then ;)

We chose to remove the headers all together because we really want to make use 
of Interactive query for the loads of data we have. Once there might be a fix 
or a better work around we can add the headers again.

Thanks for your time in investigating this with me!

> Header values keep showing up in result sets
> 
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>Reporter: Liam De Lee
>Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> ---
> --test_type--
> ---
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
> test_type  string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19649) Clean up inputs in JDBC PreparedStatement. Add unit tests.

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19649:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

pushed to master. Thank you [~mkysliuk] for adding more tests for this! :)

> Clean up inputs in JDBC PreparedStatement. Add unit tests.
> --
>
> Key: HIVE-19649
> URL: https://issues.apache.org/jira/browse/HIVE-19649
> Project: Hive
>  Issue Type: Test
>Reporter: Mykhailo Kysliuk
>Assignee: Mykhailo Kysliuk
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-19649.01.patch, HIVE-19649.02.patch, 
> HIVE-19649.03.patch
>
>
> Add unit tests for feature that was implemented in 
> [HIVE-18788|https://issues.apache.org/jira/browse/HIVE-18788].
> The integration tests are present, but it will be useful to catch errors 
> during module build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17379) Null Pointer Exception in WHERE clause when using aggregate function as a filter

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523502#comment-16523502
 ] 

Zoltan Haindrich commented on HIVE-17379:
-

current master seems to be not affected by this issue

> Null Pointer Exception in WHERE clause when using aggregate function as a 
> filter  
> --
>
> Key: HIVE-17379
> URL: https://issues.apache.org/jira/browse/HIVE-17379
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.1
>Reporter: Sharanya Santhanam
>Priority: Major
>
> Sample Query : 
> with tableAAlias as (
>select a, count(z)  as acount
>from tableA
>groupBy a 
> )
> select a.a, b.b 
> from tableB as b JOIN 
> tableAAlias a
> on a.a=b.a
> where a.acount > 10 
> FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:103)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:176)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:136)
> at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:246)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11149)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:490)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1270)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1412)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1199)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1189)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:444)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:514)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:882)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:836)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:732)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:223)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> The above Query Succeeds if it is modified as : 
> select a.a, b.b , *a.acount*
> from tableB as b JOIN 
> tableAAlias a
> on a.a=b.a
> where a.acount > 10 
> Please Note the original query worked on hive1.2 & breaks on Hive2.1.1 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17379) Null Pointer Exception in WHERE clause when using aggregate function as a filter

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523503#comment-16523503
 ] 

Zoltan Haindrich commented on HIVE-17379:
-

[~george.pachitariu] have you encountered this problem in a different scenario? 
(which might still occur on master?)

> Null Pointer Exception in WHERE clause when using aggregate function as a 
> filter  
> --
>
> Key: HIVE-17379
> URL: https://issues.apache.org/jira/browse/HIVE-17379
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.1
>Reporter: Sharanya Santhanam
>Priority: Major
>
> Sample Query : 
> with tableAAlias as (
>select a, count(z)  as acount
>from tableA
>groupBy a 
> )
> select a.a, b.b 
> from tableB as b JOIN 
> tableAAlias a
> on a.a=b.a
> where a.acount > 10 
> FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:103)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:176)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:136)
> at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:246)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11149)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:490)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1270)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1412)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1199)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1189)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:444)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:514)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:882)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:836)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:732)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:223)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> The above Query Succeeds if it is modified as : 
> select a.a, b.b , *a.acount*
> from tableB as b JOIN 
> tableAAlias a
> on a.a=b.a
> where a.acount > 10 
> Please Note the original query worked on hive1.2 & breaks on Hive2.1.1 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18140) Partitioned tables statistics can go wrong in basic stats mixed case

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-18140:

Attachment: HIVE-18140.04.patch

> Partitioned tables statistics can go wrong in basic stats mixed case
> 
>
> Key: HIVE-18140
> URL: https://issues.apache.org/jira/browse/HIVE-18140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-18140.01.patch, HIVE-18140.01wip01.patch, 
> HIVE-18140.01wip03.patch, HIVE-18140.01wip04.patch, HIVE-18140.02.patch, 
> HIVE-18140.02wip01.patch, HIVE-18140.03.patch, HIVE-18140.04.patch, 
> HIVE-19140.02wip02.patch, HIVE-19727.02wip03.patch
>
>
> suppose the following scenario:
> * part1 has basic stats {{RC=10,DS=1K}}
> * all other partition has no basic stats (and a bunch of rows)
> then 
> [this|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L378]
>  condition would be false; which in turn produces estimations for the whole 
> partitioned table: {{RC=10,DS=1K}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18140) Partitioned tables statistics can go wrong in basic stats mixed case

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523536#comment-16523536
 ] 

Zoltan Haindrich commented on HIVE-18140:
-

rebased patch
[~ashutoshc] Could you please take a look?

> Partitioned tables statistics can go wrong in basic stats mixed case
> 
>
> Key: HIVE-18140
> URL: https://issues.apache.org/jira/browse/HIVE-18140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-18140.01.patch, HIVE-18140.01wip01.patch, 
> HIVE-18140.01wip03.patch, HIVE-18140.01wip04.patch, HIVE-18140.02.patch, 
> HIVE-18140.02wip01.patch, HIVE-18140.03.patch, HIVE-18140.04.patch, 
> HIVE-19140.02wip02.patch, HIVE-19727.02wip03.patch
>
>
> suppose the following scenario:
> * part1 has basic stats {{RC=10,DS=1K}}
> * all other partition has no basic stats (and a bunch of rows)
> then 
> [this|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L378]
>  condition would be false; which in turn produces estimations for the whole 
> partitioned table: {{RC=10,DS=1K}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2018-06-26 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523542#comment-16523542
 ] 

Peter Vary commented on HIVE-19994:
---

My guess would be, that somehow a new partition is added to the table 
concurrently, or something like that happened.

This part of the code is responsible to set the CD to null before dropping a 
table:

[https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1426]

The method code:
{code:java}
/**
* Called right before an action that would drop a storage descriptor.
* This function makes the SD's reference to a CD null, and then deletes the CD
* if it no longer is referenced in the table.
* @param msd the storage descriptor to drop
*/
private void preDropStorageDescriptor(MStorageDescriptor msd) {
if (msd == null || msd.getCD() == null) {
return;
}
MColumnDescriptor mcd = msd.getCD();
// Because there is a 1-N relationship between CDs and SDs,
// we must set the SD's CD to null first before dropping the storage descriptor
// to satisfy foreign key constraints.
msd.setCD(null);
removeUnusedColumnDescriptor(mcd);
}{code}

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Priority: Major
> Attachments: metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19944) Investigate and fix version mismatch of GCP

2018-06-26 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523658#comment-16523658
 ] 

Adam Szita commented on HIVE-19944:
---

The test run has finished successfully while a new image was added in the 
project. [~vihangk1], [~stakiar] can you please take a look, it's just a one 
liner change, and will enable us to move forward with our docker image plans.

> Investigate and fix version mismatch of GCP
> ---
>
> Key: HIVE-19944
> URL: https://issues.apache.org/jira/browse/HIVE-19944
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-19944.0.patch
>
>
> We've observed that adding a new image to the ptest GCP project breaks our 
> currently working infrastructure when we try to restart the hive ptest server.
> This is because upon initialization the project's images are queried and we 
> immediately get an exception for newly added images - they don't have a field 
> that our client thinks should be mandatory to have. I believe there's an 
> upgrade needed on our side for the GCP libs we depend on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19969) Dependency order (dirlist) assessment fails in yetus run

2018-06-26 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523670#comment-16523670
 ] 

Adam Szita commented on HIVE-19969:
---

[~pvary] there's no change in that file (checked manually) that shouldn't even 
be related to this I think.

> Dependency order (dirlist) assessment fails in yetus run
> 
>
> Key: HIVE-19969
> URL: https://issues.apache.org/jira/browse/HIVE-19969
> Project: Hive
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-19969.0.patch, HIVE-19969.1.patch, 
> HIVE-19969.2.patch
>
>
> As seen here, the dirlist step of yetus fails to determine order of modules 
> to be built. It silently falls back to alphabetical order which may or may 
> not work depending on the patch.
> {code:java}
> Thu Jun 21 02:43:04 UTC 2018
> cd /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958
> mvn -q exec:exec -Dexec.executable=pwd -Dexec.args=''
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/storage-api
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/upgrade-acid
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/classification
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/common
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/0.23
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/scheduler
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/aggregator
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/common
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/service-rpc
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/serde
> Usage: java [-options] class [args...]
>(to execute a class)
>or  java [-options] -jar jarfile [args...]
>(to execute a jar file)
> where options include:{code}
> The problem is in standalone-metastore module: maven plugin 'exec' has a 
> global config set {{executable=java}} disregarding the dirlist task's 
> {{-Dexec.executable=pwd}} and causing the above error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19969) Dependency order (dirlist) assessment fails in yetus run

2018-06-26 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523697#comment-16523697
 ] 

Peter Vary commented on HIVE-19969:
---

Thanks [~szita]!

+1

> Dependency order (dirlist) assessment fails in yetus run
> 
>
> Key: HIVE-19969
> URL: https://issues.apache.org/jira/browse/HIVE-19969
> Project: Hive
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-19969.0.patch, HIVE-19969.1.patch, 
> HIVE-19969.2.patch
>
>
> As seen here, the dirlist step of yetus fails to determine order of modules 
> to be built. It silently falls back to alphabetical order which may or may 
> not work depending on the patch.
> {code:java}
> Thu Jun 21 02:43:04 UTC 2018
> cd /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958
> mvn -q exec:exec -Dexec.executable=pwd -Dexec.args=''
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/storage-api
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/upgrade-acid
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/classification
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/common
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/0.23
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/scheduler
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/aggregator
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/common
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/service-rpc
> /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/serde
> Usage: java [-options] class [args...]
>(to execute a class)
>or  java [-options] -jar jarfile [args...]
>(to execute a jar file)
> where options include:{code}
> The problem is in standalone-metastore module: maven plugin 'exec' has a 
> global config set {{executable=java}} disregarding the dirlist task's 
> {{-Dexec.executable=pwd}} and causing the above error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19995) Aggregate row traffic for acid tables

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19995:
---


> Aggregate row traffic for acid tables
> -
>
> Key: HIVE-19995
> URL: https://issues.apache.org/jira/browse/HIVE-19995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for transactional tables we store basic stats in case of explicit 
> analyze/rewrite; but doesn't do anything in other caseswhich may even 
> lead to plans which oom...
> It would be better to aggregate the total row traffic...because that is 
> already available; so that operator tree estimations could work with a real 
> upper bound of the row numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19996) Beeline performance poor with drivers having slow DatabaseMetaData.getPrimaryKeys impl

2018-06-26 Thread Kevin Minder (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Minder updated HIVE-19996:

Attachment: HIVE-19996.1.patch

> Beeline performance poor with drivers having slow 
> DatabaseMetaData.getPrimaryKeys impl
> --
>
> Key: HIVE-19996
> URL: https://issues.apache.org/jira/browse/HIVE-19996
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.2.1
> Environment: Issue detected using Beeline with HBase Phoenix thin 
> driver and a result set with many columns.
>Reporter: Kevin Minder
>Assignee: Kevin Minder
>Priority: Major
> Attachments: HIVE-19996.1.patch
>
>
> Beeline performance is rather poor for table output format when two 
> conditions occur for the same result set.
>  # The result set has a large number of columns.
>  # The driver being used has a slow implementation of 
> DatabaseMetaData.getPrimaryKeys.
> For example testing has shown that for a query with ~100 columns using the 
> HBase Phoenix thin driver the execution time can be cut from ~30 seconds to 
> ~2 seconds by using CSV output format vs table output format. For example: 
> {{select * from system.catalog;}}
> This is due to how primary keys are detected. Currently the Rows 
> implementation will make a metadata call for every column to determine it is 
> a primary key for display purposes. I propose optimizing this such that a 
> metadata call is only made for each unique table in the result set's columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19996) Beeline performance poor with drivers having slow DatabaseMetaData.getPrimaryKeys impl

2018-06-26 Thread Kevin Minder (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Minder reassigned HIVE-19996:
---

Assignee: Kevin Minder

> Beeline performance poor with drivers having slow 
> DatabaseMetaData.getPrimaryKeys impl
> --
>
> Key: HIVE-19996
> URL: https://issues.apache.org/jira/browse/HIVE-19996
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.2.1
> Environment: Issue detected using Beeline with HBase Phoenix thin 
> driver and a result set with many columns.
>Reporter: Kevin Minder
>Assignee: Kevin Minder
>Priority: Major
> Attachments: HIVE-19996.1.patch
>
>
> Beeline performance is rather poor for table output format when two 
> conditions occur for the same result set.
>  # The result set has a large number of columns.
>  # The driver being used has a slow implementation of 
> DatabaseMetaData.getPrimaryKeys.
> For example testing has shown that for a query with ~100 columns using the 
> HBase Phoenix thin driver the execution time can be cut from ~30 seconds to 
> ~2 seconds by using CSV output format vs table output format. For example: 
> {{select * from system.catalog;}}
> This is due to how primary keys are detected. Currently the Rows 
> implementation will make a metadata call for every column to determine it is 
> a primary key for display purposes. I propose optimizing this such that a 
> metadata call is only made for each unique table in the result set's columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19995) Aggregate row traffic for acid tables

2018-06-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19995:
--
Component/s: Transactions

> Aggregate row traffic for acid tables
> -
>
> Key: HIVE-19995
> URL: https://issues.apache.org/jira/browse/HIVE-19995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics, Transactions
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for transactional tables we store basic stats in case of explicit 
> analyze/rewrite; but doesn't do anything in other caseswhich may even 
> lead to plans which oom...
> It would be better to aggregate the total row traffic...because that is 
> already available; so that operator tree estimations could work with a real 
> upper bound of the row numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19996) Beeline performance poor with drivers having slow DatabaseMetaData.getPrimaryKeys impl

2018-06-26 Thread Kevin Minder (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Minder updated HIVE-19996:

Status: Patch Available  (was: Open)

This change modified the Rows implementation to cache the primary key set for 
each table as it is requested when the column of the result set are processed 
for output.

> Beeline performance poor with drivers having slow 
> DatabaseMetaData.getPrimaryKeys impl
> --
>
> Key: HIVE-19996
> URL: https://issues.apache.org/jira/browse/HIVE-19996
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.2.1
> Environment: Issue detected using Beeline with HBase Phoenix thin 
> driver and a result set with many columns.
>Reporter: Kevin Minder
>Assignee: Kevin Minder
>Priority: Major
> Attachments: HIVE-19996.1.patch
>
>
> Beeline performance is rather poor for table output format when two 
> conditions occur for the same result set.
>  # The result set has a large number of columns.
>  # The driver being used has a slow implementation of 
> DatabaseMetaData.getPrimaryKeys.
> For example testing has shown that for a query with ~100 columns using the 
> HBase Phoenix thin driver the execution time can be cut from ~30 seconds to 
> ~2 seconds by using CSV output format vs table output format. For example: 
> {{select * from system.catalog;}}
> This is due to how primary keys are detected. Currently the Rows 
> implementation will make a metadata call for every column to determine it is 
> a primary key for display purposes. I propose optimizing this such that a 
> metadata call is only made for each unique table in the result set's columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19995) Aggregate row traffic for acid tables

2018-06-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523822#comment-16523822
 ] 

Eugene Koifman commented on HIVE-19995:
---

There is logic in the compactor to recompute column level stats but that 
doesn't run very often - currently only for major compaction.  Perhaps this is 
worth considering

 

> Aggregate row traffic for acid tables
> -
>
> Key: HIVE-19995
> URL: https://issues.apache.org/jira/browse/HIVE-19995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics, Transactions
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for transactional tables we store basic stats in case of explicit 
> analyze/rewrite; but doesn't do anything in other caseswhich may even 
> lead to plans which oom...
> It would be better to aggregate the total row traffic...because that is 
> already available; so that operator tree estimations could work with a real 
> upper bound of the row numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-18762) Support ALTER TABLE SET OWNER command

2018-06-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña resolved HIVE-18762.

   Resolution: Fixed
Fix Version/s: 3.0.0

> Support ALTER TABLE SET OWNER command
> -
>
> Key: HIVE-18762
> URL: https://issues.apache.org/jira/browse/HIVE-18762
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: kalyan kumar kalvagadda
>Assignee: Sergio Peña
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently only a user can be a owner of hive table. It should be extended so 
> that either user/role can be set a owner of a table.
> With this support ownership of a table can be transferred to either user or 
> role.
> Should be able to run below commands and change the ownership
> {noformat}
> alter table tb1 set owner user user1;
> alter table tb1 set owner role role1;{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19995) Aggregate row traffic for acid tables

2018-06-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523835#comment-16523835
 ] 

Zoltan Haindrich commented on HIVE-19995:
-

This is only about basic stats: like rowcount.
It seems  OrccRecordUpdater already provides rowCountDelta correctly (and it 
can be negative) ; so it will keep track of deletes as well - for free :D
I think this will probably work very reliably in general for acid tables.


> Aggregate row traffic for acid tables
> -
>
> Key: HIVE-19995
> URL: https://issues.apache.org/jira/browse/HIVE-19995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics, Transactions
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for transactional tables we store basic stats in case of explicit 
> analyze/rewrite; but doesn't do anything in other caseswhich may even 
> lead to plans which oom...
> It would be better to aggregate the total row traffic...because that is 
> already available; so that operator tree estimations could work with a real 
> upper bound of the row numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18140) Partitioned tables statistics can go wrong in basic stats mixed case

2018-06-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523837#comment-16523837
 ] 

Ashutosh Chauhan commented on HIVE-18140:
-

+1

> Partitioned tables statistics can go wrong in basic stats mixed case
> 
>
> Key: HIVE-18140
> URL: https://issues.apache.org/jira/browse/HIVE-18140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-18140.01.patch, HIVE-18140.01wip01.patch, 
> HIVE-18140.01wip03.patch, HIVE-18140.01wip04.patch, HIVE-18140.02.patch, 
> HIVE-18140.02wip01.patch, HIVE-18140.03.patch, HIVE-18140.04.patch, 
> HIVE-19140.02wip02.patch, HIVE-19727.02wip03.patch
>
>
> suppose the following scenario:
> * part1 has basic stats {{RC=10,DS=1K}}
> * all other partition has no basic stats (and a bunch of rows)
> then 
> [this|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L378]
>  condition would be false; which in turn produces estimations for the whole 
> partitioned table: {{RC=10,DS=1K}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19995) Aggregate row traffic for acid tables

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19995:

Attachment: HIVE-19995.01wip01.patch

> Aggregate row traffic for acid tables
> -
>
> Key: HIVE-19995
> URL: https://issues.apache.org/jira/browse/HIVE-19995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics, Transactions
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19995.01wip01.patch
>
>
> for transactional tables we store basic stats in case of explicit 
> analyze/rewrite; but doesn't do anything in other caseswhich may even 
> lead to plans which oom...
> It would be better to aggregate the total row traffic...because that is 
> already available; so that operator tree estimations could work with a real 
> upper bound of the row numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19995) Aggregate row traffic for acid tables

2018-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19995:

Status: Patch Available  (was: Open)

> Aggregate row traffic for acid tables
> -
>
> Key: HIVE-19995
> URL: https://issues.apache.org/jira/browse/HIVE-19995
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics, Transactions
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19995.01wip01.patch
>
>
> for transactional tables we store basic stats in case of explicit 
> analyze/rewrite; but doesn't do anything in other caseswhich may even 
> lead to plans which oom...
> It would be better to aggregate the total row traffic...because that is 
> already available; so that operator tree estimations could work with a real 
> upper bound of the row numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19326) stats auto gather: incorrect aggregation during UNION queries (may lead to incorrect results)

2018-06-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523844#comment-16523844
 ] 

Ashutosh Chauhan commented on HIVE-19326:
-

Is it the resultset or stats which keep changing. If stats, we can mask them. 
If this doesn't repro locally at all, we shall disable this test as flaky and 
deal with it on seperate jira to unblock this patch.

> stats auto gather: incorrect aggregation during UNION queries (may lead to 
> incorrect results)
> -
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-19326.01wip01.patch, HIVE-19326.02.patch, 
> HIVE-19326.03.patch, HIVE-19326.04.patch, HIVE-19326.05.patch, 
> HIVE-19326.06.patch, HIVE-19326.06wip01.patch, HIVE-19326.06wip02.patch, 
> HIVE-19326.06wip03.patch, HIVE-19326.06wip04.patch, HIVE-19326.06wip05.patch, 
> HIVE-19326.07.patch, HIVE-19326.08.patch, HIVE-19326.09.patch
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-19997:
--


> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19951) Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution)

2018-06-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-19951:

Status: In Progress  (was: Patch Available)

> Vectorization: Need to disable encoded LLAP I/O for ORC when there is data 
> type conversion  (Schema Evolution)
> --
>
> Key: HIVE-19951
> URL: https://issues.apache.org/jira/browse/HIVE-19951
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-19951.01.patch, HIVE-19951.02.patch, 
> HIVE-19951.03.patch, HIVE-19951.04.patch, HIVE-19951.05.patch, 
> HIVE-19951.06.patch
>
>
> Currently, reading encoded ORC data does not support data type conversion.  
> So, encoded reading and cache populating needs to be disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-19997 started by Jesus Camacho Rodriguez.
--
> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-19997:
---
Attachment: HIVE-19997.patch

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-19997:
---
Description: I have observed {{TestMiniDruidCliDriver}} takes a long time 
to execute. I verified that execution is not batched. We could batch tests as 
we do with {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only 
a small number of tests.  (was: I have observed {{TestMiniDruidCliDriver}} 
takes a long time to execute. I verified that execution is not batched. We 
could batch tests as we do with {{TestHBaseCliDriver}}, i.e., 5 q files per 
batch.)

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523872#comment-16523872
 ] 

Jesus Camacho Rodriguez commented on HIVE-19997:


[~vihangk1], could you take a look and let me know what you think? Now that 
ptests infra is down, it is probably a good time to deploy it. Thanks

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19951) Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution)

2018-06-26 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-19951:

Status: Patch Available  (was: In Progress)

> Vectorization: Need to disable encoded LLAP I/O for ORC when there is data 
> type conversion  (Schema Evolution)
> --
>
> Key: HIVE-19951
> URL: https://issues.apache.org/jira/browse/HIVE-19951
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-19951.01.patch, HIVE-19951.02.patch, 
> HIVE-19951.03.patch, HIVE-19951.04.patch, HIVE-19951.05.patch, 
> HIVE-19951.06.patch
>
>
> Currently, reading encoded ORC data does not support data type conversion.  
> So, encoded reading and cache populating needs to be disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19763) Prevent execution of very large queries

2018-06-26 Thread Luis E Martinez-Poblete (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523894#comment-16523894
 ] 

Luis E Martinez-Poblete commented on HIVE-19763:


[~sershe], I'm aware of the strict checks, however they do not help when the 
query is during compilation phase as pointed out by [~pvary]

Hive should have a protection mechanism to avoid situations in which a query 
can potentially exhaust memory or any other resources during compilation phase. 

 

> Prevent execution of very large queries
> ---
>
> Key: HIVE-19763
> URL: https://issues.apache.org/jira/browse/HIVE-19763
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 2.3.2
>Reporter: Luis E Martinez-Poblete
>Priority: Minor
>
> Synopsis:
> =
> Prevent execution of very large queries.
>  
> Feature Request:
> 
> Please enhance Hive with a parameter to restrict the execution of very large 
> queries.
> Use case: User is trying to create a view with a size of 8 MB. Creation of 
> this view was possible after increasing heap memory in several components 
> (HMS, HS2, Zookeeper). However, this view caused major issues when it was 
> used in a CTE query which resulted in GC pauses and eventually OOM of the HS2 
> process.
>  
> Although, it is possible to create the view, it may cause other issues when 
> used in queries. From the Hadoop administrator point of view, it would be 
> good to restrict this type of queries.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523917#comment-16523917
 ] 

Vihang Karajgaonkar commented on HIVE-19997:


This file does not affect the batch sizes. Unfortunately, the file which 
controls the batch sizes is different than this one and is loaded when the 
ptest starts. I will be able to port these changes to that file from your 
change. I will create a separate JIRA to make it easy to change the batch sizes 
for everyone.

These two lines doesn't make sense to me. What are you intending to do with 
these lines? I don't think they are needed.
qFileTest.miniDruid.include = normal
qFileTest.miniDruid.isolate = long

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19944) Investigate and fix version mismatch of GCP

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523923#comment-16523923
 ] 

Vihang Karajgaonkar commented on HIVE-19944:


How does it help with have docker installation available in the worker nodes? I 
have made some progress with using containers to run test batches. But right 
now I am manually installing docker on the nodes to get around the issue with 
having a image which has docker pre-installed.

> Investigate and fix version mismatch of GCP
> ---
>
> Key: HIVE-19944
> URL: https://issues.apache.org/jira/browse/HIVE-19944
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-19944.0.patch
>
>
> We've observed that adding a new image to the ptest GCP project breaks our 
> currently working infrastructure when we try to restart the hive ptest server.
> This is because upon initialization the project's images are queried and we 
> immediately get an exception for newly added images - they don't have a field 
> that our client thinks should be mandatory to have. I believe there's an 
> upgrade needed on our side for the GCP libs we depend on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-06-26 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523928#comment-16523928
 ] 

Xuefu Zhang commented on HIVE-19671:


Yeah. I think it makes sense. Thank.

> Distribute by rand() can lead to data inconsistency
> ---
>
> Key: HIVE-19671
> URL: https://issues.apache.org/jira/browse/HIVE-19671
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>
> Noticed the following queries can give different results:
> {code}
> select count(*) from tbl;
> select count(*) from (select * from tbl distribute by rand()) a;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19416) Create single version transactional table metastore statistics for aggregation queries

2018-06-26 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523931#comment-16523931
 ] 

Eugene Koifman commented on HIVE-19416:
---

RE: lock acquisition.  The lock manager doesn't have an y deadlock detection 
logic in it.  It works by acquiring all locks for a given txn in 1 atomic 
operation.  So if locks are acquired piece meal, it can lead to deadlock.

> Create single version transactional table metastore statistics for 
> aggregation queries
> --
>
> Key: HIVE-19416
> URL: https://issues.apache.org/jira/browse/HIVE-19416
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
>
> The system should use only statistics for aggregation queries like count on 
> transactional tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523938#comment-16523938
 ] 

Jesus Camacho Rodriguez commented on HIVE-19997:


Thanks [~vihangk1]. I copied them from above configuration properties in file, 
if you think they are not needed, I can delete them (to be clear, I have not 
explored how they are used and it is the first time I modify this file).

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19820) add ACID stats support to background stats updater

2018-06-26 Thread Steve Yeom (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523945#comment-16523945
 ] 

Steve Yeom commented on HIVE-19820:
---

[~sershe] I did not check much details yet but I think the current testTxnTable 
and testTxnPartitions at the test module 
uses 0 as txnId but txnId is starting from 1. 
Also noticed that the way they check and get transactional stats may not be the 
way to do.I will try more specifics.

> add ACID stats support to background stats updater
> --
>
> Key: HIVE-19820
> URL: https://issues.apache.org/jira/browse/HIVE-19820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19820.01-master-txnstats.patch
>
>
> Follow-up from HIVE-19418.
> Right now it checks whether stats are valid in an old-fashioned way... and 
> also gets ACID state, and discards it without using.
> When ACID stats are implemented, ACID state needs to be used to do 
> version-aware valid stats checks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-19997:
---
Attachment: HIVE-19997.01.patch

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.01.patch, HIVE-19997.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523951#comment-16523951
 ] 

Jesus Camacho Rodriguez commented on HIVE-19997:


OK, got it, those are the groups... Uploaded a second patch where I removed the 
'isolate' part.

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.01.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-19997:
---
Attachment: (was: HIVE-19997.patch)

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.01.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19027) Make materializations invalidation cache work with multiple active remote metastores

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-19027:
---
Attachment: HIVE-19027.03.patch

> Make materializations invalidation cache work with multiple active remote 
> metastores
> 
>
> Key: HIVE-19027
> URL: https://issues.apache.org/jira/browse/HIVE-19027
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-19027.01.patch, HIVE-19027.02.patch, 
> HIVE-19027.03.patch
>
>
> The main points:
>  - Only MVs stored in transactional tables can have a time window value of 0. 
> Those are the only MVs that can be guaranteed to not be outdated when a query 
> is executed, if we use custom storage handlers to store the materialized 
> view, we cannot make any promises.
>  - For MVs that +cannot be outdated+, we do not check the metastore. Instead, 
> comparison is based on valid write id lists.
>  - For MVs that +can be outdated+, we still rely on the invalidation cache.
>  ** The window for valid outdated MVs can be specified in intervals of 1 
> minute (less than that, it is difficult to have any guarantees about whether 
> the MV is actually outdated by less than a minute or not).
>  ** The async loading is done every interval / 2 (or probably better, we can 
> make it configurable).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524033#comment-16524033
 ] 

Vihang Karajgaonkar commented on HIVE-19997:


+1 I merged that patch manually to the master-mr2.properties on ptest server. 
It should start using batch size of 5 qfiles for TestMiniDruidCliDriver from 
the next run.

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.01.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-19997.

Resolution: Fixed

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19997.01.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-19997:
---
Fix Version/s: 4.0.0
   3.1.0

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19997.01.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19998) Changing batch sizes and test driver specific configuration should be open to everyone

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-19998:
--


> Changing batch sizes and test driver specific configuration should be open to 
> everyone
> --
>
> Key: HIVE-19998
> URL: https://issues.apache.org/jira/browse/HIVE-19998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> Currently, we have to change batch sizes we need to manually update profiles 
> on the Ptest server. We should expose this configuration file to all the 
> users so that simple commit to the branch and updating ptest server should 
> work. We should remove the sensitive information from the profiles to a 
> separate file and get the batch size information from the source code 
> directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524039#comment-16524039
 ] 

Jesus Camacho Rodriguez commented on HIVE-19997:


Thanks [~vihangk1]!

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19997.01.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19997) Batches for TestMiniDruidCliDriver

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524040#comment-16524040
 ] 

Vihang Karajgaonkar commented on HIVE-19997:


Created HIVE-19998 to expose this information to everyone.

> Batches for TestMiniDruidCliDriver
> --
>
> Key: HIVE-19997
> URL: https://issues.apache.org/jira/browse/HIVE-19997
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19997.01.patch
>
>
> I have observed {{TestMiniDruidCliDriver}} takes a long time to execute. I 
> verified that execution is not batched. We could batch tests as we do with 
> {{TestHBaseCliDriver}}, i.e., 5 q files per batch, as there is only a small 
> number of tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18118) Explain Extended should indicate if a file being read is an EC file

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18118:
--
Attachment: HIVE-18118.8.patch

> Explain Extended should indicate if a file being read is an EC file
> ---
>
> Key: HIVE-18118
> URL: https://issues.apache.org/jira/browse/HIVE-18118
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-18118.1.patch, HIVE-18118.2.patch, 
> HIVE-18118.3.patch, HIVE-18118.4.patch, HIVE-18118.5.patch, 
> HIVE-18118.6.patch, HIVE-18118.7.patch, HIVE-18118.8.patch
>
>
> We already print out the files Hive will read in the explain extended 
> command, we just have to modify it to say whether or not its an EC file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19966) Compile error when using open-jdk

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-19966.

Resolution: Not A Problem

> Compile error when using open-jdk
> -
>
> Key: HIVE-19966
> URL: https://issues.apache.org/jira/browse/HIVE-19966
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> When you compile Hive using open-jdk-8 you see this error
> {noformat}
> hive/ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java:[72,16]
>  sun.misc.Cleaner is internal proprietary API and may be removed in a future 
> release
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-exec
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19966) Compile error when using open-jdk

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524073#comment-16524073
 ] 

Vihang Karajgaonkar commented on HIVE-19966:


I think you are right [~sershe] It was OOM error caused due to insufficient 
memory for maven. Incidentally, this error was the one reported right after and 
it looked like the build failed due to this error. Will close this as not an 
issue.

> Compile error when using open-jdk
> -
>
> Key: HIVE-19966
> URL: https://issues.apache.org/jira/browse/HIVE-19966
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> When you compile Hive using open-jdk-8 you see this error
> {noformat}
> hive/ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java:[72,16]
>  sun.misc.Cleaner is internal proprietary API and may be removed in a future 
> release
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-exec
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18118) Explain Extended should indicate if a file being read is an EC file

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18118:
--
Attachment: HIVE-18118.9.patch

> Explain Extended should indicate if a file being read is an EC file
> ---
>
> Key: HIVE-18118
> URL: https://issues.apache.org/jira/browse/HIVE-18118
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-18118.1.patch, HIVE-18118.2.patch, 
> HIVE-18118.3.patch, HIVE-18118.4.patch, HIVE-18118.5.patch, 
> HIVE-18118.6.patch, HIVE-18118.7.patch, HIVE-18118.8.patch, HIVE-18118.9.patch
>
>
> We already print out the files Hive will read in the explain extended 
> command, we just have to modify it to say whether or not its an EC file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18118) Explain Extended should indicate if a file being read is an EC file

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18118:
--
Attachment: HIVE-18118.10.patch

> Explain Extended should indicate if a file being read is an EC file
> ---
>
> Key: HIVE-18118
> URL: https://issues.apache.org/jira/browse/HIVE-18118
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-18118.1.patch, HIVE-18118.10.patch, 
> HIVE-18118.2.patch, HIVE-18118.3.patch, HIVE-18118.4.patch, 
> HIVE-18118.5.patch, HIVE-18118.6.patch, HIVE-18118.7.patch, 
> HIVE-18118.8.patch, HIVE-18118.9.patch
>
>
> We already print out the files Hive will read in the explain extended 
> command, we just have to modify it to say whether or not its an EC file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19938) Upgrade scripts for information schema

2018-06-26 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524092#comment-16524092
 ] 

Thejas M Nair commented on HIVE-19938:
--

Discussed offline with Daniel - Setting the version using python had to 
workaround many limitations in shell command support in beeline. As a result, 
the there was no way to use variables like $HOME to create a more specific file 
location for the version information file. Though a syntax that should work 
with multiple versions of python has been chosen, there is still some risk with 
the python command usage (python would be a new requirement AFAIK).

The longer term approach would be to consolidate the version of 
information_schema and db_schema into a table in DB and information_schema just 
being a view on top of it (like earlier patch). But that would require lot more 
refactoring, which we can address seperately.
For now a safer approach would be to use "alter table " to set table property 
containing version information of "insert into" . 

For background: The issue with earlier usage of "insert into version" with 
information_schema is that it would require a tez/spark task to run to execute 
it. That means there would be new dependency on yarn to be up and running, and 
the install could end up taking minutes in some cases.




> Upgrade scripts for information schema
> --
>
> Key: HIVE-19938
> URL: https://issues.apache.org/jira/browse/HIVE-19938
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-19938.1.patch, HIVE-19938.2.patch, 
> HIVE-19938.3.patch, HIVE-19938.4.patch
>
>
> To make schematool -upgradeSchema work for information schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19763) Prevent execution of very large queries

2018-06-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524102#comment-16524102
 ] 

Sergey Shelukhin commented on HIVE-19763:
-

Sounds like we need more strict checks :)

> Prevent execution of very large queries
> ---
>
> Key: HIVE-19763
> URL: https://issues.apache.org/jira/browse/HIVE-19763
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 2.3.2
>Reporter: Luis E Martinez-Poblete
>Priority: Minor
>
> Synopsis:
> =
> Prevent execution of very large queries.
>  
> Feature Request:
> 
> Please enhance Hive with a parameter to restrict the execution of very large 
> queries.
> Use case: User is trying to create a view with a size of 8 MB. Creation of 
> this view was possible after increasing heap memory in several components 
> (HMS, HS2, Zookeeper). However, this view caused major issues when it was 
> used in a CTE query which resulted in GC pauses and eventually OOM of the HS2 
> process.
>  
> Although, it is possible to create the view, it may cause other issues when 
> used in queries. From the Hadoop administrator point of view, it would be 
> good to restrict this type of queries.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19581) view do not support unicode characters well

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-19581:
--
Attachment: HIVE-19581.4.patch

> view do not support unicode characters well
> ---
>
> Key: HIVE-19581
> URL: https://issues.apache.org/jira/browse/HIVE-19581
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: kai
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-19581.1.patch, HIVE-19581.2.patch, 
> HIVE-19581.3.patch, HIVE-19581.4.patch, explain.png, metastore.png
>
>
> create table t_test (name ,string) ;
>  insert into table t_test VALUES ('李四');
>  create view t_view_test as select * from t_test where name='李四';
> when select  * from t_view_test   no  records return



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19532) fix tests for master-txnstats branch

2018-06-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524105#comment-16524105
 ] 

Sergey Shelukhin commented on HIVE-19532:
-

The code changes are committed to master-txnstats branch

> fix tests for master-txnstats branch
> 
>
> Key: HIVE-19532
> URL: https://issues.apache.org/jira/browse/HIVE-19532
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19532.01.patch, HIVE-19532.01.prepatch, 
> HIVE-19532.02.patch, HIVE-19532.02.prepatch, HIVE-19532.03.patch, 
> HIVE-19532.04.patch, HIVE-19532.05.patch, HIVE-19532.06.patch, 
> HIVE-19532.07.patch, HIVE-19532.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19532) fix tests for master-txnstats branch

2018-06-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-19532:
---

Assignee: Sergey Shelukhin  (was: Steve Yeom)

> fix tests for master-txnstats branch
> 
>
> Key: HIVE-19532
> URL: https://issues.apache.org/jira/browse/HIVE-19532
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19532.01.patch, HIVE-19532.01.prepatch, 
> HIVE-19532.02.patch, HIVE-19532.02.prepatch, HIVE-19532.03.patch, 
> HIVE-19532.04.patch, HIVE-19532.05.patch, HIVE-19532.06.patch, 
> HIVE-19532.07.patch, HIVE-19532.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19532) fix tests for master-txnstats branch

2018-06-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524110#comment-16524110
 ] 

Sergey Shelukhin commented on HIVE-19532:
-

Same patchj again, HiveQA failed

> fix tests for master-txnstats branch
> 
>
> Key: HIVE-19532
> URL: https://issues.apache.org/jira/browse/HIVE-19532
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19532.01.patch, HIVE-19532.01.prepatch, 
> HIVE-19532.02.patch, HIVE-19532.02.prepatch, HIVE-19532.03.patch, 
> HIVE-19532.04.patch, HIVE-19532.05.patch, HIVE-19532.06.patch, 
> HIVE-19532.07.patch, HIVE-19532.08.patch, HIVE-19532.09.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19532) fix tests for master-txnstats branch

2018-06-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19532:

Attachment: HIVE-19532.09.patch

> fix tests for master-txnstats branch
> 
>
> Key: HIVE-19532
> URL: https://issues.apache.org/jira/browse/HIVE-19532
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19532.01.patch, HIVE-19532.01.prepatch, 
> HIVE-19532.02.patch, HIVE-19532.02.prepatch, HIVE-19532.03.patch, 
> HIVE-19532.04.patch, HIVE-19532.05.patch, HIVE-19532.06.patch, 
> HIVE-19532.07.patch, HIVE-19532.08.patch, HIVE-19532.09.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18118) Explain Extended should indicate if a file being read is an EC file

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18118:
--
Attachment: HIVE-18118.10.patch

> Explain Extended should indicate if a file being read is an EC file
> ---
>
> Key: HIVE-18118
> URL: https://issues.apache.org/jira/browse/HIVE-18118
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-18118.1.patch, HIVE-18118.10.patch, 
> HIVE-18118.10.patch, HIVE-18118.2.patch, HIVE-18118.3.patch, 
> HIVE-18118.4.patch, HIVE-18118.5.patch, HIVE-18118.6.patch, 
> HIVE-18118.7.patch, HIVE-18118.8.patch, HIVE-18118.9.patch
>
>
> We already print out the files Hive will read in the explain extended 
> command, we just have to modify it to say whether or not its an EC file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18118) Explain Extended should indicate if a file being read is an EC file

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18118:
--
Attachment: HIVE-18118.11.patch

> Explain Extended should indicate if a file being read is an EC file
> ---
>
> Key: HIVE-18118
> URL: https://issues.apache.org/jira/browse/HIVE-18118
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-18118.1.patch, HIVE-18118.10.patch, 
> HIVE-18118.10.patch, HIVE-18118.11.patch, HIVE-18118.2.patch, 
> HIVE-18118.3.patch, HIVE-18118.4.patch, HIVE-18118.5.patch, 
> HIVE-18118.6.patch, HIVE-18118.7.patch, HIVE-18118.8.patch, HIVE-18118.9.patch
>
>
> We already print out the files Hive will read in the explain extended 
> command, we just have to modify it to say whether or not its an EC file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19999) Move precommit jobs to jdk 8

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-1:
--


> Move precommit jobs to jdk 8
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19999) Move precommit jobs to jdk 8

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-1:
---
Attachment: HIVE-1.01.patch

> Move precommit jobs to jdk 8
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Attachments: HIVE-1.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18118) Explain Extended should indicate if a file being read is an EC file

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18118:
--
Attachment: HIVE-18118.11.patch

> Explain Extended should indicate if a file being read is an EC file
> ---
>
> Key: HIVE-18118
> URL: https://issues.apache.org/jira/browse/HIVE-18118
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-18118.1.patch, HIVE-18118.10.patch, 
> HIVE-18118.10.patch, HIVE-18118.11.patch, HIVE-18118.11.patch, 
> HIVE-18118.2.patch, HIVE-18118.3.patch, HIVE-18118.4.patch, 
> HIVE-18118.5.patch, HIVE-18118.6.patch, HIVE-18118.7.patch, 
> HIVE-18118.8.patch, HIVE-18118.9.patch
>
>
> We already print out the files Hive will read in the explain extended 
> command, we just have to modify it to say whether or not its an EC file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19999) Move precommit jobs to jdk 8

2018-06-26 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524138#comment-16524138
 ] 

Vineet Garg commented on HIVE-1:


[~vihangk1] Is this patch being tested?

> Move precommit jobs to jdk 8
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Attachments: HIVE-1.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20000) woooohoo20000ooooooo

2018-06-26 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524145#comment-16524145
 ] 

Vineet Garg commented on HIVE-2:


Eagerly waiting for this feature.

> whoo2ooo
> 
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: All Versions
>Reporter: Prasanth Jayachandran
>Priority: Blocker
> Fix For: All Versions
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20000) woooohoo20000ooooooo

2018-06-26 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-2:
-
Description: 
{code:java}
   :::  :::  :::  ::: 
:+::+::+:   :+::+:   :+::+:   :+::+:   :+:
  +:+ +:+  :+:++:+  :+:++:+  :+:++:+  :+:+
+#+   +#+ + +:++#+ + +:++#+ + +:++#+ + +:+
  +#+ +#+#  +#++#+#  +#++#+#  +#++#+#  +#+
 #+#  #+#   #+##+#   #+##+#   #+##+#   #+#
## ###  ###  ###  ### 
{code}

> whoo2ooo
> 
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: All Versions
>Reporter: Prasanth Jayachandran
>Priority: Blocker
> Fix For: All Versions
>
>
> {code:java}
>    :::  :::  :::  ::: 
> :+::+::+:   :+::+:   :+::+:   :+::+:   :+:
>   +:+ +:+  :+:++:+  :+:++:+  :+:++:+  :+:+
> +#+   +#+ + +:++#+ + +:++#+ + +:++#+ + +:+
>   +#+ +#+#  +#++#+#  +#++#+#  +#++#+#  +#+
>  #+#  #+#   #+##+#   #+##+#   #+##+#   #+#
> ## ###  ###  ###  ### 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18118) Explain Extended should indicate if a file being read is an EC file

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18118:
--
Attachment: HIVE-18118.12.patch

> Explain Extended should indicate if a file being read is an EC file
> ---
>
> Key: HIVE-18118
> URL: https://issues.apache.org/jira/browse/HIVE-18118
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-18118.1.patch, HIVE-18118.10.patch, 
> HIVE-18118.10.patch, HIVE-18118.11.patch, HIVE-18118.11.patch, 
> HIVE-18118.12.patch, HIVE-18118.2.patch, HIVE-18118.3.patch, 
> HIVE-18118.4.patch, HIVE-18118.5.patch, HIVE-18118.6.patch, 
> HIVE-18118.7.patch, HIVE-18118.8.patch, HIVE-18118.9.patch
>
>
> We already print out the files Hive will read in the explain extended 
> command, we just have to modify it to say whether or not its an EC file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19581) view do not support unicode characters well

2018-06-26 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-19581:
--
Attachment: HIVE-19581.5.patch

> view do not support unicode characters well
> ---
>
> Key: HIVE-19581
> URL: https://issues.apache.org/jira/browse/HIVE-19581
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: kai
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-19581.1.patch, HIVE-19581.2.patch, 
> HIVE-19581.3.patch, HIVE-19581.4.patch, HIVE-19581.5.patch, explain.png, 
> metastore.png
>
>
> create table t_test (name ,string) ;
>  insert into table t_test VALUES ('李四');
>  create view t_view_test as select * from t_test where name='李四';
> when select  * from t_view_test   no  records return



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19999) Move precommit jobs to jdk 8

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524160#comment-16524160
 ] 

Vihang Karajgaonkar commented on HIVE-1:


I merged it already. This was needed to upgrade the jenkins job to java 8. Now 
it looks like HiveQA is back. There were other issues as well caused by 
HIVE-19944

> Move precommit jobs to jdk 8
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Attachments: HIVE-1.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-12192) Hive should carry out timestamp computations in UTC

2018-06-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12192:
---
Attachment: HIVE-12192.27.patch

> Hive should carry out timestamp computations in UTC
> ---
>
> Key: HIVE-12192
> URL: https://issues.apache.org/jira/browse/HIVE-12192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Ryan Blue
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>  Labels: timestamp
> Fix For: 3.1.0
>
> Attachments: HIVE-12192.01.patch, HIVE-12192.02.patch, 
> HIVE-12192.03.patch, HIVE-12192.04.patch, HIVE-12192.05.patch, 
> HIVE-12192.06.patch, HIVE-12192.07.patch, HIVE-12192.08.patch, 
> HIVE-12192.09.patch, HIVE-12192.10.patch, HIVE-12192.11.patch, 
> HIVE-12192.12.patch, HIVE-12192.13.patch, HIVE-12192.14.patch, 
> HIVE-12192.15.patch, HIVE-12192.16.patch, HIVE-12192.17.patch, 
> HIVE-12192.18.patch, HIVE-12192.19.patch, HIVE-12192.20.patch, 
> HIVE-12192.21.patch, HIVE-12192.22.patch, HIVE-12192.23.patch, 
> HIVE-12192.24.patch, HIVE-12192.25.patch, HIVE-12192.26.patch, 
> HIVE-12192.27.patch, HIVE-12192.patch
>
>
> Hive currently uses the "local" time of a java.sql.Timestamp to represent the 
> SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use 
> {{Timestamp#getYear()}} and similar methods to implement SQL functions like 
> {{year}}.
> When the SQL session's time zone is a DST zone, such as America/Los_Angeles 
> that alternates between PST and PDT, there are times that cannot be 
> represented because the effective zone skips them.
> {code}
> hive> select TIMESTAMP '2015-03-08 02:10:00.101';
> 2015-03-08 03:10:00.101
> {code}
> Using UTC instead of the SQL session time zone as the underlying zone for a 
> java.sql.Timestamp avoids this bug, while still returning correct values for 
> {{getYear}} etc. Using UTC as the convenience representation (timestamp 
> without time zone has no real zone) would make timestamp calculations more 
> consistent and avoid similar problems in the future.
> Notably, this would break the {{unix_timestamp}} UDF that specifies the 
> result is with respect to ["the default timezone and default 
> locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions].
>  That function would need to be updated to use the 
> {{System.getProperty("user.timezone")}} zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19944) Investigate and fix version mismatch of GCP

2018-06-26 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524163#comment-16524163
 ] 

Vihang Karajgaonkar commented on HIVE-19944:


Hi [~szita] Looks like this broke ptest server. When I restart ptest server for 
fixing HIVE-1 it didn't come because of this exception.

{code}
java.lang.NullPointerException: Null archiveSizeBytes
at 
org.jclouds.googlecomputeengine.domain.AutoValue_Image.(AutoValue_Image.java:67)
 ~[google-compute-engine-2.0.0.jar:2.0.0]
at org.jclouds.googlecomputeengine.domain.Image.create(Image.java:100) 
~[google-compute-engine-2.0.0.jar:2.0.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.7.0_181]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
~[?:1.7.0_181]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.7.0_181]
at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_181]
at 
com.google.common.reflect.Invokable$MethodInvokable.invokeInternal(Invokable.java:197)
 ~[guava-18.0.jar:?]
at com.google.common.reflect.Invokable.invoke(Invokable.java:102) 
~[guava-18.0.jar:?]
at 
org.jclouds.json.internal.DeserializationConstructorAndReflectiveTypeAdapterFactory$DeserializeIntoParameterizedConstructor.newInstance(DeserializationConstructorAndReflectiveTypeAdapterFactory.java:224)
 ~[jclouds-core-2.0.0.jar:2.0.0]
at 
org.jclouds.json.internal.DeserializationConstructorAndReflectiveTypeAdapterFactory$DeserializeIntoParameterizedConstructor.read(DeserializationConstructorAndReflectiveTypeAdapterFactory.java:204)
 ~[jclouds-core-2.0.0.jar:2.0.0]
at 
org.jclouds.googlecloud.config.ListPageAdapterFactory$ListPageAdapter.readItems(ListPageAdapterFactory.java:73)
 ~[googlecloud-2.0.0.jar:2.0.0]
at 
org.jclouds.googlecloud.config.ListPageAdapterFactory$ListPageAdapter.read(ListPageAdapterFactory.java:56)
 ~[googlecloud-2.0.0.jar:2.0.0]
at 
org.jclouds.googlecloud.config.ListPageAdapterFactory$ListPageAdapter.read(ListPageAdapterFactory.java:36)
 ~[googlecloud-2.0.0.jar:2.0.0]
at com.google.gson.Gson.fromJson(Gson.java:861) ~[gson-2.5.jar:?]
at com.google.gson.Gson.fromJson(Gson.java:826) ~[gson-2.5.jar:?]
at org.jclouds.json.internal.GsonWrapper.fromJson(GsonWrapper.java:55) 
~[jclouds-core-2.0.0.jar:2.0.0]
at org.jclouds.http.functions.ParseJson.apply(ParseJson.java:82) 
~[jclouds-core-2.0.0.jar:2.0.0]
at org.jclouds.http.functions.ParseJson.apply(ParseJson.java:76) 
~[jclouds-core-2.0.0.jar:2.0.0]
at org.jclouds.http.functions.ParseJson.apply(ParseJson.java:61) 
[jclouds-core-2.0.0.jar:2.0.0]
at org.jclouds.http.functions.ParseJson.apply(ParseJson.java:41) 
[jclouds-core-2.0.0.jar:2.0.0]
at 
com.google.common.base.Functions$FunctionComposition.apply(Functions.java:216) 
[guava-18.0.jar:?]
at 
org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:90) 
[jclouds-core-2.0.0.jar:2.0.0]
at 
org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:73) 
[jclouds-core-2.0.0.jar:2.0.0]
at 
org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:44) 
[jclouds-core-2.0.0.jar:2.0.0]
at 
org.jclouds.reflect.FunctionalReflection$FunctionalInvocationHandler.handleInvocation(FunctionalReflection.java:117)
 [jclouds-core-2.0.0.jar:2.0.0]
at 
com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
 [guava-18.0.jar:?]
at com.sun.proxy.$Proxy77.list(Unknown Source) [?:?]
at 
org.jclouds.googlecomputeengine.compute.GoogleComputeEngineServiceAdapter.listImages(GoogleComputeEngineServiceAdapter.java:209)
 [google-compute-engine-2.0.0.jar:2.0.0]
at 
org.jclouds.compute.config.ComputeServiceAdapterContextModule$2.get(ComputeServiceAdapterContextModule.java:121)
 [jclouds-compute-2.0.0.jar:2.0.0]
at 
org.jclouds.compute.config.ComputeServiceAdapterContextModule$2.get(ComputeServiceAdapterContextModule.java:118)
 [jclouds-compute-2.0.0.jar:2.0.0]
at 
org.jclouds.rest.suppliers.MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier$SetAndThrowAuthorizationExceptionSupplierBackedLoader.load(MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier.java:75)
 [jclouds-core-2.0.0.jar:2.0.0]
at 
org.jclouds.rest.suppliers.MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier$SetAndThrowAuthorizationExceptionSupplierBackedLoader.load(MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier.java:57)
 [jclouds-core-2.0.0.jar:2.0.0]
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
 [guava-18.0.jar:?]
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:23

  1   2   3   >