from:"Sergey Shelukhin \\\(JIRA\\\)"

[jira] [Commented] (HIVE-18689) restore inheritPerms functionality and extend it to ACID

2018-02-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363271#comment-16363271
 ] 

Sergey Shelukhin commented on HIVE-18689:
-

Discussed with [~ashutoshc] who doesn't want to restore this feature to master. 
Our current plan is to rely on new HDFS feature (ACLs) for inherit permissions 
functionality in 3.0, since Hive 3 will anyway only work with Hadoop 3.X.
I don't think it is wise but I don't care enough to argue more about this.

We are basically making a conscious decision to rely on a brand new, never 
executed in production (and never even included in a released version as of 
now, as far as I know) feature for this crucial (to anyone using storage based 
auth) functionality with no possibility of fallback in case there's some issue 
with it (which we have or had for every significant new feature in Hive from 
major e.g. CBO, Tez, LLAP, Vectorization, etc., to small optimizations).

I will just leave this patch here for the record (and in case if it's needed 
for forward ports) and handle ACID separately on 2.X branch in HIVE-18710.

> restore inheritPerms functionality and extend it to ACID
> 
>
> Key: HIVE-18689
> URL: https://issues.apache.org/jira/browse/HIVE-18689
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18689.patch
>
>
> This functionality was removed for no clear reason (if it doesn't apply to 
> some use case it can just be disabled).
> It's still in use; in fact, it should be extended to ACID table 
> subdirectories.
> This patch restores the functionality with some cleanup (to not access config 
> everywhere, mostly), disables it by default, and extends it to ACID tables.
> There's a coming HDFS feature that will automatically inherit permissions. 
> When that is shipped in a non-beta version and stabilized a bit, we can 
> remove this functionality... however I dunno if that is good for other 
> potential use cases, like non-HDFS file systems that do have a concept of a 
> directory (Isilon?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18689) restore inheritPerms functionality and extend it to ACID

2018-02-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363271#comment-16363271
 ] 

Sergey Shelukhin edited comment on HIVE-18689 at 2/14/18 12:19 AM:
---

Discussed with [~ashutoshc] who doesn't want to restore this feature to master. 
Our current plan is to rely on new HDFS feature (ACLs) for inherit permissions 
functionality in 3.0, since Hive 3 will anyway only work with Hadoop 3.X.
I don't think it is wise but I don't care enough about this functionality to 
argue more about this.

We are basically making a conscious decision to rely on a brand new, never 
executed in production (and never even included in a released version as of 
now, as far as I know) feature for this crucial (for anyone using storage based 
auth) functionality with no possibility of fallback in case there's some issue 
with it; the kind of fall-back we had, and often used and still use, for every 
significant new feature in Hive from major e.g. CBO, Tez, LLAP, Vectorization, 
etc., to small optimizations).

I will just leave this patch here for the record (and in case if it's needed 
for forward ports) and handle ACID separately on 2.X branch in HIVE-18710.


was (Author: sershe):
Discussed with [~ashutoshc] who doesn't want to restore this feature to master. 
Our current plan is to rely on new HDFS feature (ACLs) for inherit permissions 
functionality in 3.0, since Hive 3 will anyway only work with Hadoop 3.X.
I don't think it is wise but I don't care enough about this functionality to 
argue more about this.

We are basically making a conscious decision to rely on a brand new, never 
executed in production (and never even included in a released version as of 
now, as far as I know) feature for this crucial (to anyone using storage based 
auth) functionality with no possibility of fallback in case there's some issue 
with it (which we have or had for every significant new feature in Hive from 
major e.g. CBO, Tez, LLAP, Vectorization, etc., to small optimizations).

I will just leave this patch here for the record (and in case if it's needed 
for forward ports) and handle ACID separately on 2.X branch in HIVE-18710.

> restore inheritPerms functionality and extend it to ACID
> 
>
> Key: HIVE-18689
> URL: https://issues.apache.org/jira/browse/HIVE-18689
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18689.patch
>
>
> This functionality was removed for no clear reason (if it doesn't apply to 
> some use case it can just be disabled).
> It's still in use; in fact, it should be extended to ACID table 
> subdirectories.
> This patch restores the functionality with some cleanup (to not access config 
> everywhere, mostly), disables it by default, and extends it to ACID tables.
> There's a coming HDFS feature that will automatically inherit permissions. 
> When that is shipped in a non-beta version and stabilized a bit, we can 
> remove this functionality... however I dunno if that is good for other 
> potential use cases, like non-HDFS file systems that do have a concept of a 
> directory (Isilon?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18689) restore inheritPerms functionality and extend it to ACID

2018-02-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363271#comment-16363271
 ] 

Sergey Shelukhin edited comment on HIVE-18689 at 2/14/18 12:20 AM:
---

Discussed with [~ashutoshc] who doesn't want to restore this feature to master. 
Our current plan is to rely on new HDFS feature (ACLs) for inherit permissions 
functionality in 3.0, since Hive 3 will anyway only work with Hadoop 3.X.
I don't think it is wise but I don't care enough about this functionality to 
argue more about this.

We are basically making a conscious decision to rely on a brand new, never 
executed in production (and never even included in a released version as of 
now, as far as I know) feature for this crucial (for anyone using storage based 
auth) functionality with no possibility of fallback in case there's some issue 
with it; the kind of fall-back we had, and often used and still use, for every 
significant new feature in Hive from major e.g. CBO, Tez, LLAP, Vectorization, 
etc., to small optimizations.

I will just leave this patch here for the record (and in case if it's needed 
for forward ports) and handle ACID separately on 2.X branch in HIVE-18710.


was (Author: sershe):
Discussed with [~ashutoshc] who doesn't want to restore this feature to master. 
Our current plan is to rely on new HDFS feature (ACLs) for inherit permissions 
functionality in 3.0, since Hive 3 will anyway only work with Hadoop 3.X.
I don't think it is wise but I don't care enough about this functionality to 
argue more about this.

We are basically making a conscious decision to rely on a brand new, never 
executed in production (and never even included in a released version as of 
now, as far as I know) feature for this crucial (for anyone using storage based 
auth) functionality with no possibility of fallback in case there's some issue 
with it; the kind of fall-back we had, and often used and still use, for every 
significant new feature in Hive from major e.g. CBO, Tez, LLAP, Vectorization, 
etc., to small optimizations).

I will just leave this patch here for the record (and in case if it's needed 
for forward ports) and handle ACID separately on 2.X branch in HIVE-18710.

> restore inheritPerms functionality and extend it to ACID
> 
>
> Key: HIVE-18689
> URL: https://issues.apache.org/jira/browse/HIVE-18689
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18689.patch
>
>
> This functionality was removed for no clear reason (if it doesn't apply to 
> some use case it can just be disabled).
> It's still in use; in fact, it should be extended to ACID table 
> subdirectories.
> This patch restores the functionality with some cleanup (to not access config 
> everywhere, mostly), disables it by default, and extends it to ACID tables.
> There's a coming HDFS feature that will automatically inherit permissions. 
> When that is shipped in a non-beta version and stabilized a bit, we can 
> remove this functionality... however I dunno if that is good for other 
> potential use cases, like non-HDFS file systems that do have a concept of a 
> directory (Isilon?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18710:

Attachment: HIVE-18710-branch-2.patch

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18710:

Assignee: Sergey Shelukhin
  Status: Patch Available  (was: Open)

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18689) restore inheritPerms functionality and extend it to ACID

2018-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18689:

Status: Open  (was: Patch Available)

> restore inheritPerms functionality and extend it to ACID
> 
>
> Key: HIVE-18689
> URL: https://issues.apache.org/jira/browse/HIVE-18689
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18689.patch
>
>
> This functionality was removed for no clear reason (if it doesn't apply to 
> some use case it can just be disabled).
> It's still in use; in fact, it should be extended to ACID table 
> subdirectories.
> This patch restores the functionality with some cleanup (to not access config 
> everywhere, mostly), disables it by default, and extends it to ACID tables.
> There's a coming HDFS feature that will automatically inherit permissions. 
> When that is shipped in a non-beta version and stabilized a bit, we can 
> remove this functionality... however I dunno if that is good for other 
> potential use cases, like non-HDFS file systems that do have a concept of a 
> directory (Isilon?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363365#comment-16363365
 ] 

Sergey Shelukhin commented on HIVE-18710:
-

[~ashutoshc] branch-2 only patch. Can you take a look?

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18622) Vectorization: IF Statements, Comparisons, and more do not handle NULLs correctly

2018-02-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363446#comment-16363446
 ] 

Sergey Shelukhin commented on HIVE-18622:
-

I looked at most of the iteration 7 on RB (pages 10-13 remaining to go thru), 
and 7-8 diff.
There's one correctness issue that I found, with 0/i mixup.
Some removals of setting isNull are not clear to me.
My main concern is that either I'm missing the big picture, or there is no 
unified semantic approach to noNulls, and to the pattern of setting and 
unsetting it.
The idea of noNulls is to avoid looking at isNull, so it's not clear why 
certain places in the code that fill the meaningful parts of the batch with 
non-nulls still don't set noNulls (I commented on one or two, there are many). 
Seems like noNulls should be set every time there are noNulls, and isNull array 
should be set correctly by whoever sets noNulls to false.
So, the approach could be:
1) set isNull to false every time we set a non-null value and noNulls is not 
true.
2) when flipping noNulls to false, make sure that isNull is correct; when 
flipping it as part of larger loop that always sets isNull unconditionally 
(e.g. in TreeReader::nextVector in ORC) it's not necessary; when looping thru a 
bunch of non-nulls and finding a null it may be necessary to backfill the 
false-s in the preceding values and rely on (1) to fill the following values 
once noNulls is flipped; when setting individual elements without context it 
may be necessary to fill the array entirely (done only once when actually 
flipping noNulls).

Right now it seems like some places are too conservative and fill isNull even 
when noNulls is true; and some actually remove isNull-setting when setting 
values to non-nulls, without checking for noNulls (so, presumably if noNulls is 
true isNull could be incorrect and it's not clear the next set that happens to 
be a null doesn't flip noNulls and renders the previous value invalid.
Or at least the pattern in these approaches is not clear to me - could be 
because the patch is so large; perhaps it should be described somewhere in one 
place.



> Vectorization: IF Statements, Comparisons, and more do not handle NULLs 
> correctly
> -
>
> Key: HIVE-18622
> URL: https://issues.apache.org/jira/browse/HIVE-18622
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18622.03.patch, HIVE-18622.04.patch, 
> HIVE-18622.05.patch, HIVE-18622.06.patch, HIVE-18622.07.patch, 
> HIVE-18622.08.patch, HIVE-18622.09.patch, HIVE-18622.091.patch, 
> HIVE-18622.092.patch, HIVE-18622.093.patch, HIVE-18622.094.patch, 
> HIVE-18622.095.patch, HIVE-18622.096.patch
>
>
>  
>  Many vector expression classes are setting noNulls to true which does not 
> work if the VRB is a scratch column being reused. The previous use may have 
> set noNulls to false and the isNull array will have some rows marked as NULL. 
> The result is wrong query results and sometimes NPEs (for BytesColumnVector).
> So, many vector expressions need this:
> {code:java}
>   // Carefully handle NULLs...
>   /*
>* For better performance on LONG/DOUBLE we don't want the conditional
>* statements inside the for loop.
>*/
>   outputColVector.noNulls = false;
>  {code}
> And, vector expressions need to make sure the isNull array entry is set when 
> outputColVector.noNulls is false.
> And, all place that assign column value need to set noNulls to false when the 
> value is NULL.
> Almost all cases where noNulls is set to true are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18622) Vectorization: IF Statements, Comparisons, and more do not handle NULLs correctly

2018-02-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363446#comment-16363446
 ] 

Sergey Shelukhin edited comment on HIVE-18622 at 2/14/18 3:45 AM:
--

I looked at most of the iteration 7 on RB (pages 10-13 remaining to go thru), 
and 7-8 diff.
There's one correctness issue that I found, with 0/i mixup.
Some removals of setting isNull are not clear to me.
My main concern is that either I'm missing the big picture, or there is no 
unified semantic approach to noNulls, and to the pattern of setting and 
unsetting it.
The idea of noNulls is to avoid looking at isNull, so it's not clear why 
certain places in the code that fill the meaningful parts of the batch with 
non-nulls still don't set noNulls (I commented on one or two, there are many). 
Seems like noNulls should be set every time there are no nulls, and isNull 
array should be set correctly by whoever sets noNulls to false.
So, an example uniform approach could be:
0) The only parts of the batch that matter are those included in batch size.
1) set isNull to false every time we set a non-null value and noNulls is not 
true.
2) when flipping noNulls to false, make sure that isNull is correct; when 
flipping it as part of larger loop that always sets isNull unconditionally 
(e.g. in TreeReader::nextVector in ORC) no additional action necessary; when 
looping thru a bunch of non-nulls and finding a null it may be necessary to 
backfill the false-s in the preceding values and rely on (1) to fill the 
following values once noNulls is flipped; when setting individual elements 
without context it may be necessary to fill the array entirely (done only once 
when actually flipping noNulls).

Right now it seems like some places are too conservative and fill isNull even 
when noNulls is true; and some actually remove isNull-setting when setting 
values to non-nulls, without checking for noNulls (so, presumably if noNulls is 
true isNull could be incorrect and it's not clear the next set that happens to 
be a null doesn't flip noNulls and renders the previous value invalid.
Or at least the pattern in these approaches is not clear to me - could be 
because the patch is so large; perhaps it should be described somewhere in one 
place.




was (Author: sershe):
I looked at most of the iteration 7 on RB (pages 10-13 remaining to go thru), 
and 7-8 diff.
There's one correctness issue that I found, with 0/i mixup.
Some removals of setting isNull are not clear to me.
My main concern is that either I'm missing the big picture, or there is no 
unified semantic approach to noNulls, and to the pattern of setting and 
unsetting it.
The idea of noNulls is to avoid looking at isNull, so it's not clear why 
certain places in the code that fill the meaningful parts of the batch with 
non-nulls still don't set noNulls (I commented on one or two, there are many). 
Seems like noNulls should be set every time there are noNulls, and isNull array 
should be set correctly by whoever sets noNulls to false.
So, the approach could be:
1) set isNull to false every time we set a non-null value and noNulls is not 
true.
2) when flipping noNulls to false, make sure that isNull is correct; when 
flipping it as part of larger loop that always sets isNull unconditionally 
(e.g. in TreeReader::nextVector in ORC) it's not necessary; when looping thru a 
bunch of non-nulls and finding a null it may be necessary to backfill the 
false-s in the preceding values and rely on (1) to fill the following values 
once noNulls is flipped; when setting individual elements without context it 
may be necessary to fill the array entirely (done only once when actually 
flipping noNulls).

Right now it seems like some places are too conservative and fill isNull even 
when noNulls is true; and some actually remove isNull-setting when setting 
values to non-nulls, without checking for noNulls (so, presumably if noNulls is 
true isNull could be incorrect and it's not clear the next set that happens to 
be a null doesn't flip noNulls and renders the previous value invalid.
Or at least the pattern in these approaches is not clear to me - could be 
because the patch is so large; perhaps it should be described somewhere in one 
place.



> Vectorization: IF Statements, Comparisons, and more do not handle NULLs 
> correctly
> -
>
> Key: HIVE-18622
> URL: https://issues.apache.org/jira/browse/HIVE-18622
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18622.03.patch, HIVE-18622.04.patch, 
> HIVE-18622.05.patch, HIVE-18622.06.patch, HIVE-18622.07.patch, 
> HIVE-18622.

[jira] [Commented] (HIVE-18622) Vectorization: IF Statements, Comparisons, and more do not handle NULLs correctly

2018-02-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364774#comment-16364774
 ] 

Sergey Shelukhin commented on HIVE-18622:
-

I finished reviewing iteration 7 and 7-to-9 diff and now my brain hurts, left 
some more comments.
I hope this is covered by tests.

> Vectorization: IF Statements, Comparisons, and more do not handle NULLs 
> correctly
> -
>
> Key: HIVE-18622
> URL: https://issues.apache.org/jira/browse/HIVE-18622
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18622.03.patch, HIVE-18622.04.patch, 
> HIVE-18622.05.patch, HIVE-18622.06.patch, HIVE-18622.07.patch, 
> HIVE-18622.08.patch, HIVE-18622.09.patch, HIVE-18622.091.patch, 
> HIVE-18622.092.patch, HIVE-18622.093.patch, HIVE-18622.094.patch, 
> HIVE-18622.095.patch, HIVE-18622.096.patch, HIVE-18622.097.patch
>
>
>  
>  Many vector expression classes are setting noNulls to true which does not 
> work if the VRB is a scratch column being reused. The previous use may have 
> set noNulls to false and the isNull array will have some rows marked as NULL. 
> The result is wrong query results and sometimes NPEs (for BytesColumnVector).
> So, many vector expressions need this:
> {code:java}
>   // Carefully handle NULLs...
>   /*
>* For better performance on LONG/DOUBLE we don't want the conditional
>* statements inside the for loop.
>*/
>   outputColVector.noNulls = false;
>  {code}
> And, vector expressions need to make sure the isNull array entry is set when 
> outputColVector.noNulls is false.
> And, all place that assign column value need to set noNulls to false when the 
> value is NULL.
> Almost all cases where noNulls is set to true are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18658) WM: allow not specifying scheduling policy when creating a pool

2018-02-14 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18658:

Attachment: HIVE-18658.01.patch

> WM: allow not specifying scheduling policy when creating a pool
> ---
>
> Key: HIVE-18658
> URL: https://issues.apache.org/jira/browse/HIVE-18658
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18658.01.patch, HIVE-18658.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18658) WM: allow not specifying scheduling policy when creating a pool

2018-02-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364968#comment-16364968
 ] 

Sergey Shelukhin commented on HIVE-18658:
-

[~prasanth_j] can you take a look? [~harishjp] already reviewed the first 
iteration on RB. The update makes an unrelated test fix.

> WM: allow not specifying scheduling policy when creating a pool
> ---
>
> Key: HIVE-18658
> URL: https://issues.apache.org/jira/browse/HIVE-18658
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18658.01.patch, HIVE-18658.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365029#comment-16365029
 ] 

Sergey Shelukhin commented on HIVE-18710:
-

[~ashutoshc] done... this also fixes some places where I think inheritperms was 
missing but should be present. Some of these like list bucketing/skew join that 
nobody cares about anyway could be un-included I guess

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18638) Triggers for multi-pool move, failing to initiate the move event

2018-02-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365033#comment-16365033
 ] 

Sergey Shelukhin commented on HIVE-18638:
-

+1 pending tests

> Triggers for multi-pool move, failing to initiate the move event
> 
>
> Key: HIVE-18638
> URL: https://issues.apache.org/jira/browse/HIVE-18638
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-18638.1.patch
>
>
> Resource plan with multiple pools and trigger set to move job across those 
> pools seems to be failing to do so
> Resource plan:
> {noformat}
> 1: jdbc:hive2://ctr-e137-1514896590304-51538-> show resource plan plan_2; 
> INFO : Compiling 
> command(queryId=hive_20180202220823_2fb8bca7-5b7a-48cf-8ff9-8d5f3548d334): 
> show resource plan plan_2 INFO : Semantic Analysis Completed INFO : Returning 
> Hive schema: Schema(fieldSchemas:[FieldSchema(name:line, type:string, 
> comment:from deserializer)], properties:null) INFO : Completed compiling 
> command(queryId=hive_20180202220823_2fb8bca7-5b7a-48cf-8ff9-8d5f3548d334); 
> Time taken: 0.008 seconds INFO : Executing 
> command(queryId=hive_20180202220823_2fb8bca7-5b7a-48cf-8ff9-8d5f3548d334): 
> show resource plan plan_2 INFO : Starting task [Stage-0:DDL] in serial mode 
> INFO : Completed executing 
> command(queryId=hive_20180202220823_2fb8bca7-5b7a-48cf-8ff9-8d5f3548d334); 
> Time taken: 0.196 seconds INFO : OK 
> ++ | line | 
> ++ | 
> plan_2[status=ACTIVE,parallelism=null,defaultPool=pool2] | | + 
> pool2[allocFraction=0.5,schedulingPolicy=default,parallelism=3] | | | trigger 
> too_large_write_triger: if (HDFS_BYTES_WRITTEN > 10kb) { MOVE TO pool1 } | | 
> | mapped for default | | + 
> pool1[allocFraction=0.3,schedulingPolicy=default,parallelism=5] | | | trigger 
> slow_pool_trigger: if (ELAPSED_TIME > 3) { MOVE TO pool3 } | | + 
> pool3[allocFraction=0.2,schedulingPolicy=default,parallelism=3] | | + 
> default[allocFraction=0.0,schedulingPolicy=null,parallelism=4] | 
> ++ 8 rows selected (0.25 
> seconds)
> {noformat}
> Workload Manager Events Summary from query run:
> {noformat}
> INFO  : {
>   "queryId" : "hive_20180202213425_9633d7af-4242-4e95-a391-2cd3823e3eac",
>   "queryStartTime" : 1517607265395,
>   "queryEndTime" : 1517607321648,
>   "queryCompleted" : true,
>   "queryWmEvents" : [ {
> "wmTezSessionInfo" : {
>   "sessionId" : "21f8a4ab-511e-4828-a2dd-1d5f2932c492",
>   "poolName" : "pool2",
>   "clusterPercent" : 50.0
> },
> "eventStartTimestamp" : 1517607269660,
> "eventEndTimestamp" : 1517607269661,
> "eventType" : "GET",
> "elapsedTime" : 1
>   }, {
> "wmTezSessionInfo" : {
>   "sessionId" : "21f8a4ab-511e-4828-a2dd-1d5f2932c492",
>   "poolName" : null,
>   "clusterPercent" : 0.0
> },
> "eventStartTimestamp" : 1517607321663,
> "eventEndTimestamp" : 1517607321663,
> "eventType" : "RETURN",
> "elapsedTime" : 0
>   } ],
>   "appliedTriggers" : [ {
> "name" : "too_large_write_triger",
> "expression" : {
>   "counterLimit" : {
> "limit" : 10240,
> "name" : "HDFS_BYTES_WRITTEN"
>   },
>   "predicate" : "GREATER_THAN"
> },
> "action" : {
>   "type" : "MOVE_TO_POOL",
>   "poolName" : "pool1"
> },
> "violationMsg" : null
>   } ],
>   "subscribedCounters" : [ "HDFS_BYTES_WRITTEN" ],
>   "currentCounters" : {
> "HDFS_BYTES_WRITTEN" : 33306829
>   },
>   "elapsedTime" : 56284
> }
> {noformat}
> From the Workload Manager Event Summary it could seen that the 'MOVE' event 
> didn't happen though the limit for counter (10240) HDFS_BYTES_WRITTEN was 
> exceeded



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18622) Vectorization: IF Statements, Comparisons, and more do not handle NULLs correctly

2018-02-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366160#comment-16366160
 ] 

Sergey Shelukhin commented on HIVE-18622:
-

+1 pending build/tests... not sure about the test removal, I guess it's ok to 
do in followup jira. Why was it removed?
It may be worth committing this patch because it's better than prior state and 
we can fix/optimize more stuff later.

> Vectorization: IF Statements, Comparisons, and more do not handle NULLs 
> correctly
> -
>
> Key: HIVE-18622
> URL: https://issues.apache.org/jira/browse/HIVE-18622
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18622.03.patch, HIVE-18622.04.patch, 
> HIVE-18622.05.patch, HIVE-18622.06.patch, HIVE-18622.07.patch, 
> HIVE-18622.08.patch, HIVE-18622.09.patch, HIVE-18622.091.patch, 
> HIVE-18622.092.patch, HIVE-18622.093.patch, HIVE-18622.094.patch, 
> HIVE-18622.095.patch, HIVE-18622.096.patch, HIVE-18622.097.patch, 
> HIVE-18622.098.patch, HIVE-18622.099.patch, HIVE-18622.0991.patch
>
>
>  
>  Many vector expression classes are setting noNulls to true which does not 
> work if the VRB is a scratch column being reused. The previous use may have 
> set noNulls to false and the isNull array will have some rows marked as NULL. 
> The result is wrong query results and sometimes NPEs (for BytesColumnVector).
> So, many vector expressions need this:
> {code:java}
>   // Carefully handle NULLs...
>   /*
>* For better performance on LONG/DOUBLE we don't want the conditional
>* statements inside the for loop.
>*/
>   outputColVector.noNulls = false;
>  {code}
> And, vector expressions need to make sure the isNull array entry is set when 
> outputColVector.noNulls is false.
> And, all place that assign column value need to set noNulls to false when the 
> value is NULL.
> Almost all cases where noNulls is set to true are incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18737) add an option to disable LLAP IO ACID for non-original files

2018-02-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18737:
---


> add an option to disable LLAP IO ACID for non-original files
> 
>
> Key: HIVE-18737
> URL: https://issues.apache.org/jira/browse/HIVE-18737
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18738:

Reporter: Deepesh Khandelwal  (was: Sergey Shelukhin)

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18738:
---

Assignee: Sergey Shelukhin

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18737) add an option to disable LLAP IO ACID for non-original files

2018-02-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18737:

Attachment: HIVE-18737.patch

> add an option to disable LLAP IO ACID for non-original files
> 
>
> Key: HIVE-18737
> URL: https://issues.apache.org/jira/browse/HIVE-18737
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18737.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18737) add an option to disable LLAP IO ACID for non-original files

2018-02-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18737:

Status: Patch Available  (was: Open)

[~teddy.choi] [~prasanth_j] can you take a look? small patch

> add an option to disable LLAP IO ACID for non-original files
> 
>
> Key: HIVE-18737
> URL: https://issues.apache.org/jira/browse/HIVE-18737
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18737.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18742) Vectorization acid/inputformat check should allow NullRowsInputFormat/OneNullRowInputFormat

2018-02-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368070#comment-16368070
 ] 

Sergey Shelukhin commented on HIVE-18742:
-

+1 adding explain to the test would be nice, to see the nullscan

> Vectorization acid/inputformat check should allow 
> NullRowsInputFormat/OneNullRowInputFormat
> ---
>
> Key: HIVE-18742
> URL: https://issues.apache.org/jira/browse/HIVE-18742
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18742.1.patch
>
>
> Vectorizer.verifyAndSetVectorPartDesc() has a Preconditions check to ensure 
> the InputFormat is ORC only. However there can be metadataonly or empty 
> result optimizations on Acid tables, which change the input format to 
> NullRows/OneNullRowInputFormat, which gets tripped up on this check.
> Relaxing this check to allow nullrows and onenullrow input formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18738:

Attachment: HIVE-18738.patch

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368092#comment-16368092
 ] 

Sergey Shelukhin commented on HIVE-18738:
-

A WIP patch that clarifies the semantics of various include variables and fixes 
the double-ACID-ifying and other incorrect treatment of them in the IO elevator.

It already works if one makes changes to not exclude the ACID ROW column from 
includes and removes the added nested column magic from ORC genIncludedColumns; 
however it then reads all ACID struct nested columns, even though it might only 
need a subset.
Excluding the ROW column and including its subset via nested column magic makes 
it read only a subset of the ROW struct that it needs, but everything after the 
reader (e.g. decoder) is still set up incorrectly; after treereaderfactory is 
fixed, I suspect the next thing to break will be the ACID wrapper that blindly 
puts CVs from elevator into VRB that is passed to the thing the applies 
deletes. That might not work so well without always having the whole ROW struct 
(payload will be null).

Also all the debug logging needs to be undone later.


> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18629) copyValues in BytesColumnVector may be missing null checks

2018-02-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370476#comment-16370476
 ] 

Sergey Shelukhin commented on HIVE-18629:
-

[~mmccline] ping?

> copyValues in BytesColumnVector may be missing null checks
> --
>
> Key: HIVE-18629
> URL: https://issues.apache.org/jira/browse/HIVE-18629
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18629.01.patch, HIVE-18629.02.patch, 
> HIVE-18629.03.patch, HIVE-18629.patch
>
>
> {noformat}
> Caused by: java.lang.NullPointerException
>   at java.lang.System.arraycopy(Native Method)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:173)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.copySelected(BytesColumnVector.java:333)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions..evaluate(IfExprStringGroupColumnStringGroupColumn.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:133)
> {noformat}
> IfExprStringGroupColumnStringGroupColumn code below the v1.isRepeating case 
> has isNull checks for v2/v3 buffers that copySelected is missing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17481) LLAP workload management

2018-02-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370544#comment-16370544
 ] 

Sergey Shelukhin commented on HIVE-17481:
-

Hmm.. I wouldn't be opposed to backporting it but I suspect it's quite an 
undertaking.
It might actually be easier to create shims for Hadoop 2 support for Hive 3 :) 
I suspect that might be a more popular request, too.
cc [~ashutoshc]

> LLAP workload management
> 
>
> Key: HIVE-17481
> URL: https://issues.apache.org/jira/browse/HIVE-17481
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Workload management design doc.pdf
>
>
> This effort is intended to improve various aspects of cluster sharing for 
> LLAP. Some of these are applicable to non-LLAP queries and may later be 
> extended to all queries. Administrators will be able to specify and apply 
> policies for workload management ("resource plans") that apply to the entire 
> cluster, with only one resource plan being active at a time. The policies 
> will be created and modified using new Hive DDL statements. 
> The policies will cover:
> * Dividing the cluster into a set of (optionally, nested) query pools that 
> are each allocated a fraction of the cluster, a set query parallelism, 
> resource sharing policy between queries, and potentially others like 
> priority, etc.
> * Mapping the incoming queries into pools based on the query user, groups, 
> explicit configuration, etc.
> * Specifying rules that perform actions on queries based on counter values 
> (e.g. killing or moving queries).
> One would also be able to switch policies on a live cluster without (usually) 
> affecting running queries, including e.g. to change policies for daytime and 
> nighttime usage patterns, and other similar scenarios. The switches would be 
> safe and atomic; versioning may eventually be supported.
> Some implementation details:
> * WM will only be supported in HS2 (for obvious reasons).
> * All LLAP query AMs will run in "interactive" YARN queue and will be 
> fungible between Hive pools.
> * We will use the concept of "guaranteed tasks" (also known as ducks) to 
> enforce cluster allocation without a central scheduler and without 
> compromising throughput. Guaranteed tasks preempt other (speculative) tasks 
> and are distributed from HS2 to AMs, and from AMs to tasks, in accordance 
> with percentage allocations in the policy. Each "duck" corresponds to a CPU 
> resource on the cluster. The implementation will be isolated so as to allow 
> different ones later.
> * In future, we may consider improved task placement and late binding, 
> similar to the ones described in Sparrow paper, to work around potential 
> hotspots/etc. that are not avoided with the decentralized scheme.
> * Only one HS2 will initially be supported to avoid split-brain workload 
> management. We will also implement (in a tangential set of work items) 
> active-passive HS2 recovery. Eventually, we intend to switch to full 
> active-active HS2 configuration with shared WM and Tez session pool (unlike 
> the current case with 2 separate session pools). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18742) Vectorization acid/inputformat check should allow NullRowsInputFormat/OneNullRowInputFormat

2018-02-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370549#comment-16370549
 ] 

Sergey Shelukhin commented on HIVE-18742:
-

+1

> Vectorization acid/inputformat check should allow 
> NullRowsInputFormat/OneNullRowInputFormat
> ---
>
> Key: HIVE-18742
> URL: https://issues.apache.org/jira/browse/HIVE-18742
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions, Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18742.1.patch, HIVE-18742.2.patch
>
>
> Vectorizer.verifyAndSetVectorPartDesc() has a Preconditions check to ensure 
> the InputFormat is ORC only. However there can be metadataonly or empty 
> result optimizations on Acid tables, which change the input format to 
> NullRows/OneNullRowInputFormat, which gets tripped up on this check.
> Relaxing this check to allow nullrows and onenullrow input formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370577#comment-16370577
 ] 

Sergey Shelukhin commented on HIVE-18710:
-

Addressed CR feedback.

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch, HIVE-18710.01-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18710:

Attachment: HIVE-18710.01-branch-2.patch

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch, HIVE-18710.01-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18757) LLAP IO for text fails for empty files

2018-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18757:
---


> LLAP IO for text fails for empty files
> --
>
> Key: HIVE-18757
> URL: https://issues.apache.org/jira/browse/HIVE-18757
> Project: Hive
>  Issue Type: Bug
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Sergey Shelukhin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18757) LLAP IO for text fails for empty files

2018-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18757:

Attachment: HIVE-18757.patch

> LLAP IO for text fails for empty files
> --
>
> Key: HIVE-18757
> URL: https://issues.apache.org/jira/browse/HIVE-18757
> Project: Hive
>  Issue Type: Bug
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18757.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18757) LLAP IO for text fails for empty files

2018-02-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370791#comment-16370791
 ] 

Sergey Shelukhin commented on HIVE-18757:
-

[~prasanth_j] tiny patch, mostly cosmetic changes, one line of the actual fix. 
Can you take a look?

> LLAP IO for text fails for empty files
> --
>
> Key: HIVE-18757
> URL: https://issues.apache.org/jira/browse/HIVE-18757
> Project: Hive
>  Issue Type: Bug
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18757.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18757) LLAP IO for text fails for empty files

2018-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18757:

Status: Patch Available  (was: Open)

> LLAP IO for text fails for empty files
> --
>
> Key: HIVE-18757
> URL: https://issues.apache.org/jira/browse/HIVE-18757
> Project: Hive
>  Issue Type: Bug
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18757.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18756) Vectorization: VectorUDAFVarFinal produces Wrong Results

2018-02-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370815#comment-16370815
 ] 

Sergey Shelukhin commented on HIVE-18756:
-

+1 pending tests, can you file a follow-up JIRA to fix it?

> Vectorization: VectorUDAFVarFinal produces Wrong Results
> 
>
> Key: HIVE-18756
> URL: https://issues.apache.org/jira/browse/HIVE-18756
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-18756.01.patch
>
>
> For a large query.  Disabling vectorization for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18737) add an option to disable LLAP IO ACID for non-original files

2018-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18737:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> add an option to disable LLAP IO ACID for non-original files
> 
>
> Key: HIVE-18737
> URL: https://issues.apache.org/jira/browse/HIVE-18737
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18737.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18658) WM: allow not specifying scheduling policy when creating a pool

2018-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18658:

Fix Version/s: 3.0.0

> WM: allow not specifying scheduling policy when creating a pool
> ---
>
> Key: HIVE-18658
> URL: https://issues.apache.org/jira/browse/HIVE-18658
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18658.01.patch, HIVE-18658.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18658) WM: allow not specifying scheduling policy when creating a pool

2018-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18658:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> WM: allow not specifying scheduling policy when creating a pool
> ---
>
> Key: HIVE-18658
> URL: https://issues.apache.org/jira/browse/HIVE-18658
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18658.01.patch, HIVE-18658.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18763) LLAP IO for text should take table serde into account

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18763:
---


> LLAP IO for text should take table serde into account
> -
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> It's only using partition SerDe right now. We should reconcile both for when 
> there are changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18757) LLAP IO for text fails for empty files

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18757:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> LLAP IO for text fails for empty files
> --
>
> Key: HIVE-18757
> URL: https://issues.apache.org/jira/browse/HIVE-18757
> Project: Hive
>  Issue Type: Bug
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18757.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18763:

Summary: VectorMapOperator should take into account partition->table serde 
conversion for all cases  (was: LLAP IO for text should take table serde into 
account)

> VectorMapOperator should take into account partition->table serde conversion 
> for all cases
> --
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> It's only using partition SerDe right now. We should reconcile both for when 
> there are changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18763:

Description: 
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC and never converts anything.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema - there are two such cases right now, LLAP 
IO with ORC or text data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet however will have to 
invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data;


drop table if exists vsp;
create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
textfile;
insert into table vsp partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp change column vs vs varchar(3);

drop table if exists vsp_orc;
create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
orc;
insert into table vsp_orc partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_orc change column vs vs varchar(3);

drop table if exists vsp_parquet;
create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
as parquet;
insert into table vsp_parquet partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_parquet change column vs vs varchar(3);

SET hive.llap.io.enabled=true;
-- BAD results from all queries; parquet affected regardless of IO.
select length(vs) from vsp; 
select length(vs) from vsp_orc;
select length(vs) from vsp_parquet;

SET hive.llap.io.enabled=false;
select length(vs) from vsp; -- ok
select length(vs) from vsp_orc; -- ok
select length(vs) from vsp_parquet; -- still bad
{noformat}

  was:It's only using partition SerDe right now. We should reconcile both for 
when there are changes.


> VectorMapOperator should take into account partition->table serde conversion 
> for all cases
> --
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> When table and partition schema differ, non-vectorized MapOperator does row 
> by row conversion from whatever is read to the table schema.
> VectorMapOperator is less consistent... it does the conversion as part of 
> populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
> natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
> for an example. However, the native VRB mode relies on ORC 
> ConvertTreeReader... stuff that lives in ORC and never converts anything.
> So, anything running in native VRB mode that is not the vanilla ORC reader 
> will produce data with incorrect schema - there are two such cases right now, 
> LLAP IO with ORC or text data, and Parquet. 
> It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that 
> already uses TreeReader-s for everything; LLAP IO text and Parquet however 
> will have to invent their own conversion.
> Therefore, I think the best fix for this is to treat all inputs in VMO the 
> same and convert them by default, like the regular MapOperator; and make ORC 
> special mode an exception that allows it to bypass the conversion. 
> cc [~mmccline]

[jira] [Assigned] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18763:
---

Assignee: (was: Sergey Shelukhin)

> VectorMapOperator should take into account partition->table serde conversion 
> for all cases
> --
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> When table and partition schema differ, non-vectorized MapOperator does row 
> by row conversion from whatever is read to the table schema.
> VectorMapOperator is less consistent... it does the conversion as part of 
> populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
> natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
> for an example. However, the native VRB mode relies on ORC 
> ConvertTreeReader... stuff that lives in ORC and never converts anything.
> So, anything running in native VRB mode that is not the vanilla ORC reader 
> will produce data with incorrect schema - there are two such cases right now, 
> LLAP IO with ORC or text data, and Parquet. 
> It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that 
> already uses TreeReader-s for everything; LLAP IO text and Parquet however 
> will have to invent their own conversion.
> Therefore, I think the best fix for this is to treat all inputs in VMO the 
> same and convert them by default, like the regular MapOperator; and make ORC 
> special mode an exception that allows it to bypass the conversion. 
> cc [~mmccline]
> Test case - varchar column length should be limited after alter table but it 
> isn't.
> {noformat}
> CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
> tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
> decimal(38,18), float1 float, double1 double, string1 varchar(50), string2 
> varchar(50), date1 date, timestamp1 timestamp, boolean_str string, 
> tinyint_str string, smallint_str string, int_str string, bigint_str string, 
> decimal_str string, float_str string, double_str string, date_str string, 
> timestamp_str string, filler string)
> row format delimited fields terminated by '|' stored as textfile;
> load data local inpath 
> '../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
> table schema_evolution_data;
> drop table if exists vsp;
> create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> textfile;
> insert into table vsp partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp change column vs vs varchar(3);
> drop table if exists vsp_orc;
> create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> orc;
> insert into table vsp_orc partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_orc change column vs vs varchar(3);
> drop table if exists vsp_parquet;
> create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
> as parquet;
> insert into table vsp_parquet partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_parquet change column vs vs varchar(3);
> SET hive.llap.io.enabled=true;
> -- BAD results from all queries; parquet affected regardless of IO.
> select length(vs) from vsp; 
> select length(vs) from vsp_orc;
> select length(vs) from vsp_parquet;
> SET hive.llap.io.enabled=false;
> select length(vs) from vsp; -- ok
> select length(vs) from vsp_orc; -- ok
> select length(vs) from vsp_parquet; -- still bad
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372233#comment-16372233
 ] 

Sergey Shelukhin commented on HIVE-18763:
-

cc [~vihangk1] fyi, this also affects Parquet

> VectorMapOperator should take into account partition->table serde conversion 
> for all cases
> --
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> When table and partition schema differ, non-vectorized MapOperator does row 
> by row conversion from whatever is read to the table schema.
> VectorMapOperator is less consistent... it does the conversion as part of 
> populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
> natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
> for an example. However, the native VRB mode relies on ORC 
> ConvertTreeReader... stuff that lives in ORC and never converts anything.
> So, anything running in native VRB mode that is not the vanilla ORC reader 
> will produce data with incorrect schema - there are two such cases right now, 
> LLAP IO with ORC or text data, and Parquet. 
> It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that 
> already uses TreeReader-s for everything; LLAP IO text and Parquet however 
> will have to invent their own conversion.
> Therefore, I think the best fix for this is to treat all inputs in VMO the 
> same and convert them by default, like the regular MapOperator; and make ORC 
> special mode an exception that allows it to bypass the conversion. 
> cc [~mmccline]
> Test case - varchar column length should be limited after alter table but it 
> isn't.
> {noformat}
> CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
> tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
> decimal(38,18), float1 float, double1 double, string1 varchar(50), string2 
> varchar(50), date1 date, timestamp1 timestamp, boolean_str string, 
> tinyint_str string, smallint_str string, int_str string, bigint_str string, 
> decimal_str string, float_str string, double_str string, date_str string, 
> timestamp_str string, filler string)
> row format delimited fields terminated by '|' stored as textfile;
> load data local inpath 
> '../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
> table schema_evolution_data;
> drop table if exists vsp;
> create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> textfile;
> insert into table vsp partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp change column vs vs varchar(3);
> drop table if exists vsp_orc;
> create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> orc;
> insert into table vsp_orc partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_orc change column vs vs varchar(3);
> drop table if exists vsp_parquet;
> create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
> as parquet;
> insert into table vsp_parquet partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_parquet change column vs vs varchar(3);
> SET hive.llap.io.enabled=true;
> -- BAD results from all queries; parquet affected regardless of IO.
> select length(vs) from vsp; 
> select length(vs) from vsp_orc;
> select length(vs) from vsp_parquet;
> SET hive.llap.io.enabled=false;
> select length(vs) from vsp; -- ok
> select length(vs) from vsp_orc; -- ok
> select length(vs) from vsp_parquet; -- still bad
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18763:

Description: 
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC, and so never converts anything nside VMO.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema - there are two such cases right now, LLAP 
IO with ORC or text data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet however will have to 
invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data;


drop table if exists vsp;
create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
textfile;
insert into table vsp partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp change column vs vs varchar(3);

drop table if exists vsp_orc;
create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
orc;
insert into table vsp_orc partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_orc change column vs vs varchar(3);

drop table if exists vsp_parquet;
create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
as parquet;
insert into table vsp_parquet partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_parquet change column vs vs varchar(3);

SET hive.llap.io.enabled=true;
-- BAD results from all queries; parquet affected regardless of IO.
select length(vs) from vsp; 
select length(vs) from vsp_orc;
select length(vs) from vsp_parquet;

SET hive.llap.io.enabled=false;
select length(vs) from vsp; -- ok
select length(vs) from vsp_orc; -- ok
select length(vs) from vsp_parquet; -- still bad
{noformat}

  was:
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC and never converts anything.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema - there are two such cases right now, LLAP 
IO with ORC or text data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet however will have to 
invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row fo

[jira] [Updated] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18763:

Description: 
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC, and so never converts anything nside VMO.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema if there were schema changes and partitions 
are present  - there are two such cases right now, LLAP IO with ORC or text 
data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet, as well as any 
future users however will have to invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data;


drop table if exists vsp;
create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
textfile;
insert into table vsp partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp change column vs vs varchar(3);

drop table if exists vsp_orc;
create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
orc;
insert into table vsp_orc partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_orc change column vs vs varchar(3);

drop table if exists vsp_parquet;
create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
as parquet;
insert into table vsp_parquet partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_parquet change column vs vs varchar(3);

SET hive.llap.io.enabled=true;
-- BAD results from all queries; parquet affected regardless of IO.
select length(vs) from vsp; 
select length(vs) from vsp_orc;
select length(vs) from vsp_parquet;

SET hive.llap.io.enabled=false;
select length(vs) from vsp; -- ok
select length(vs) from vsp_orc; -- ok
select length(vs) from vsp_parquet; -- still bad
{noformat}

  was:
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC, and so never converts anything nside VMO.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema if there were schema changes and partitions 
are present  - there are two such cases right now, LLAP IO with ORC or text 
data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet however will have to 
invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str str

[jira] [Updated] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18763:

Description: 
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC, and so never converts anything nside VMO.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema if there were schema changes and partitions 
are present  - there are two such cases right now, LLAP IO with ORC or text 
data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet however will have to 
invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data;


drop table if exists vsp;
create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
textfile;
insert into table vsp partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp change column vs vs varchar(3);

drop table if exists vsp_orc;
create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
orc;
insert into table vsp_orc partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_orc change column vs vs varchar(3);

drop table if exists vsp_parquet;
create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
as parquet;
insert into table vsp_parquet partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_parquet change column vs vs varchar(3);

SET hive.llap.io.enabled=true;
-- BAD results from all queries; parquet affected regardless of IO.
select length(vs) from vsp; 
select length(vs) from vsp_orc;
select length(vs) from vsp_parquet;

SET hive.llap.io.enabled=false;
select length(vs) from vsp; -- ok
select length(vs) from vsp_orc; -- ok
select length(vs) from vsp_parquet; -- still bad
{noformat}

  was:
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC, and so never converts anything nside VMO.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema - there are two such cases right now, LLAP 
IO with ORC or text data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet however will have to 
invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_s

[jira] [Updated] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18763:

Description: 
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC, and so never converts anything nside VMO.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema if there were schema changes and partitions 
are present  - there are two such cases right now, LLAP IO with ORC or text 
data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and (non-LLAP) Parquet, as well 
as any future users however will have to invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data;


drop table if exists vsp;
create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
textfile;
insert into table vsp partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp change column vs vs varchar(3);

drop table if exists vsp_orc;
create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
orc;
insert into table vsp_orc partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_orc change column vs vs varchar(3);

drop table if exists vsp_parquet;
create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
as parquet;
insert into table vsp_parquet partition(s='positive') select string1 from 
schema_evolution_data;
alter table vsp_parquet change column vs vs varchar(3);

SET hive.llap.io.enabled=true;
-- BAD results from all queries; parquet affected regardless of IO.
select length(vs) from vsp; 
select length(vs) from vsp_orc;
select length(vs) from vsp_parquet;

SET hive.llap.io.enabled=false;
select length(vs) from vsp; -- ok
select length(vs) from vsp_orc; -- ok
select length(vs) from vsp_parquet; -- still bad
{noformat}

  was:
When table and partition schema differ, non-vectorized MapOperator does row by 
row conversion from whatever is read to the table schema.
VectorMapOperator is less consistent... it does the conversion as part of 
populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
for an example. However, the native VRB mode relies on ORC ConvertTreeReader... 
stuff that lives in ORC, and so never converts anything nside VMO.

So, anything running in native VRB mode that is not the vanilla ORC reader will 
produce data with incorrect schema if there were schema changes and partitions 
are present  - there are two such cases right now, LLAP IO with ORC or text 
data, and Parquet. 
It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that already 
uses TreeReader-s for everything; LLAP IO text and Parquet, as well as any 
future users however will have to invent their own conversion.
Therefore, I think the best fix for this is to treat all inputs in VMO the same 
and convert them by default, like the regular MapOperator; and make ORC special 
mode an exception that allows it to bypass the conversion. 
cc [~mmccline]

Test case - varchar column length should be limited after alter table but it 
isn't.
{noformat}
CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 decimal(38,18), 
float1 float, double1 double, string1 varchar(50), string2 varchar(50), date1 
date, timestamp1 timestamp, boolean_str strin

[jira] [Commented] (HIVE-18764) ELAPSED_TIME resource plan setting is not getting honored

2018-02-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372248#comment-16372248
 ] 

Sergey Shelukhin commented on HIVE-18764:
-

Nit: the update.. method deals with a bunch of wmContext fields and should 
probably just be inside wmContext.
Otherwise +1

> ELAPSED_TIME resource plan setting is not getting honored
> -
>
> Key: HIVE-18764
> URL: https://issues.apache.org/jira/browse/HIVE-18764
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jagruti Varia
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-18764.1.patch
>
>
> Trigger validation for ELAPSED_TIME counter should happen even if session is 
> not created. Currently ELAPSED_TIME counter is populated only after session 
> creation but a query can be waiting to get a session for a long time by the 
> time trigger might have been violated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18710:

Attachment: HIVE-18710.02-branch-2.patch

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch, HIVE-18710.01-branch-2.patch, 
> HIVE-18710.02-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372335#comment-16372335
 ] 

Sergey Shelukhin commented on HIVE-18710:
-

Updated. Note that there's the same comment you left on both iterations of CR 
that I've left open asking for clarifications :)

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18710-branch-2.patch, HIVE-18710.01-branch-2.patch, 
> HIVE-18710.02-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-22 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373272#comment-16373272
 ] 

Sergey Shelukhin edited comment on HIVE-18763 at 2/22/18 7:12 PM:
--

Hey, I was planning to work on it, but not for a while (after some other bugs), 
so feel free if you want to.
My thinking after looking at the row converters was that the actual changes for 
most types are simple and insignificant (varchar lengths is one), so the change 
would be mostly wiring. It should be easy to add a separate VRB converter 
rather than trying to shoehorn this into VectorDeserializeRow or OI converters, 
since both of these deal with rows (row->row or row->VRB). All the necessary 
schema and stuff are available in VectorMapOperator - it is already used for 
the non-native-VRB case.
Then making sure it doesn't apply to cases that do their own conversion (the 
only one right now is ORC) is another small wiring issue. 



was (Author: sershe):
Hey, I was planning to work on it, but not for a while (after some other bugs), 
so feel free if you want to.
My thinking after looking at the row converters was that the actual changes for 
most types are simple and insignificant (varchar lengths is one), so the change 
would be mostly wiring. It should be easy to add a separate VRB converter 
rather than trying to shoehorn this into VectorDeserializeRow or OI converters, 
since both of these deal with rows (row->row or row->VRB). All the necessary 
schema and stuff are available in VectorMapOperator - it is already used for 
the non-native-VRB case.
Then making sure it doesn't apply to cases that do their own conversion (the 
only one right now is ORC) is another small wiring issue. 
Under current plan I have for stuff I can do it next week probably... 

> VectorMapOperator should take into account partition->table serde conversion 
> for all cases
> --
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> When table and partition schema differ, non-vectorized MapOperator does row 
> by row conversion from whatever is read to the table schema.
> VectorMapOperator is less consistent... it does the conversion as part of 
> populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
> natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
> for an example. However, the native VRB mode relies on ORC 
> ConvertTreeReader... stuff that lives in ORC, and so never converts anything 
> nside VMO.
> So, anything running in native VRB mode that is not the vanilla ORC reader 
> will produce data with incorrect schema if there were schema changes and 
> partitions are present  - there are two such cases right now, LLAP IO with 
> ORC or text data, and Parquet. 
> It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that 
> already uses TreeReader-s for everything; LLAP IO text and (non-LLAP) 
> Parquet, as well as any future users however will have to invent their own 
> conversion.
> Therefore, I think the best fix for this is to treat all inputs in VMO the 
> same and convert them by default, like the regular MapOperator; and make ORC 
> special mode an exception that allows it to bypass the conversion. 
> cc [~mmccline]
> Test case - varchar column length should be limited after alter table but it 
> isn't.
> {noformat}
> CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
> tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
> decimal(38,18), float1 float, double1 double, string1 varchar(50), string2 
> varchar(50), date1 date, timestamp1 timestamp, boolean_str string, 
> tinyint_str string, smallint_str string, int_str string, bigint_str string, 
> decimal_str string, float_str string, double_str string, date_str string, 
> timestamp_str string, filler string)
> row format delimited fields terminated by '|' stored as textfile;
> load data local inpath 
> '../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
> table schema_evolution_data;
> drop table if exists vsp;
> create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> textfile;
> insert into table vsp partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp change column vs vs varchar(3);
> drop table if exists vsp_orc;
> create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> orc;
> insert into table vsp_orc partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_orc change column vs vs varchar(3);
> drop table if exists vsp_parquet;
> create table vsp_parquet(vs var

[jira] [Commented] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-22 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373272#comment-16373272
 ] 

Sergey Shelukhin commented on HIVE-18763:
-

Hey, I was planning to work on it, but not for a while (after some other bugs), 
so feel free if you want to.
My thinking after looking at the row converters was that the actual changes for 
most types are simple and insignificant (varchar lengths is one), so the change 
would be mostly wiring. It should be easy to add a separate VRB converter 
rather than trying to shoehorn this into VectorDeserializeRow or OI converters, 
since both of these deal with rows (row->row or row->VRB). All the necessary 
schema and stuff are available in VectorMapOperator - it is already used for 
the non-native-VRB case.
Then making sure it doesn't apply to cases that do their own conversion (the 
only one right now is ORC) is another small wiring issue. 
Under current plan I have for stuff I can do it next week probably... 

> VectorMapOperator should take into account partition->table serde conversion 
> for all cases
> --
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> When table and partition schema differ, non-vectorized MapOperator does row 
> by row conversion from whatever is read to the table schema.
> VectorMapOperator is less consistent... it does the conversion as part of 
> populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
> natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
> for an example. However, the native VRB mode relies on ORC 
> ConvertTreeReader... stuff that lives in ORC, and so never converts anything 
> nside VMO.
> So, anything running in native VRB mode that is not the vanilla ORC reader 
> will produce data with incorrect schema if there were schema changes and 
> partitions are present  - there are two such cases right now, LLAP IO with 
> ORC or text data, and Parquet. 
> It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that 
> already uses TreeReader-s for everything; LLAP IO text and (non-LLAP) 
> Parquet, as well as any future users however will have to invent their own 
> conversion.
> Therefore, I think the best fix for this is to treat all inputs in VMO the 
> same and convert them by default, like the regular MapOperator; and make ORC 
> special mode an exception that allows it to bypass the conversion. 
> cc [~mmccline]
> Test case - varchar column length should be limited after alter table but it 
> isn't.
> {noformat}
> CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
> tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
> decimal(38,18), float1 float, double1 double, string1 varchar(50), string2 
> varchar(50), date1 date, timestamp1 timestamp, boolean_str string, 
> tinyint_str string, smallint_str string, int_str string, bigint_str string, 
> decimal_str string, float_str string, double_str string, date_str string, 
> timestamp_str string, filler string)
> row format delimited fields terminated by '|' stored as textfile;
> load data local inpath 
> '../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
> table schema_evolution_data;
> drop table if exists vsp;
> create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> textfile;
> insert into table vsp partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp change column vs vs varchar(3);
> drop table if exists vsp_orc;
> create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> orc;
> insert into table vsp_orc partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_orc change column vs vs varchar(3);
> drop table if exists vsp_parquet;
> create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
> as parquet;
> insert into table vsp_parquet partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_parquet change column vs vs varchar(3);
> SET hive.llap.io.enabled=true;
> -- BAD results from all queries; parquet affected regardless of IO.
> select length(vs) from vsp; 
> select length(vs) from vsp_orc;
> select length(vs) from vsp_parquet;
> SET hive.llap.io.enabled=false;
> select length(vs) from vsp; -- ok
> select length(vs) from vsp_orc; -- ok
> select length(vs) from vsp_parquet; -- still bad
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-22 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18738:

Attachment: HIVE-18738.patch

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-22 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18738:

Attachment: (was: HIVE-18738.patch)

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-22 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18738:

Status: Patch Available  (was: Open)

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-22 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373923#comment-16373923
 ] 

Sergey Shelukhin commented on HIVE-18738:
-

[~teddy.choi] [~prasanth_j] can you take a look?
This moves include handling to one place for LLAP IO and fixes stuff that 
didn't work, and adds the test

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18710) extend inheritPerms to ACID in Hive 2.X

2018-02-23 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18710:

   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

Committed to branch-2.
Test failures look unrelated, ACID test has an order change.
Thanks for the review!

> extend inheritPerms to ACID in Hive 2.X
> ---
>
> Key: HIVE-18710
> URL: https://issues.apache.org/jira/browse/HIVE-18710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: HIVE-18710-branch-2.patch, HIVE-18710.01-branch-2.patch, 
> HIVE-18710.02-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18571) stats issues for MM tables

2018-02-23 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18571:

Attachment: HIVE-18571.02.patch

> stats issues for MM tables
> --
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18571) stats issues for MM tables

2018-02-23 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375243#comment-16375243
 ] 

Sergey Shelukhin commented on HIVE-18571:
-

Rebased the patch and updated to fix some tests.
Added AcidUtils method to collect files for stats based on ACID state, that can 
be used for analyze queries.
However, I'm not sure it can be used at any other time (e.g. during 
insert/create) because current ACID state does not reflect the transaction that 
is in progress.
I think we can tackle that in a followup jira, I left a bunch of TODOs.

> stats issues for MM tables
> --
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18795) upgrade accumulo to 1.8.1

2018-02-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377403#comment-16377403
 ] 

Sergey Shelukhin commented on HIVE-18795:
-

Looks like this breaks the Accumulo test driver. IIRC it has happened last time 
someone tried to upgrade it, too... someone needs to fix it :)

> upgrade accumulo to 1.8.1
> -
>
> Key: HIVE-18795
> URL: https://issues.apache.org/jira/browse/HIVE-18795
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>Priority: Minor
> Attachments: HIVE-18795.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18796) fix TestSSL

2018-02-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18796:
---

Assignee: Sergey Shelukhin

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18796) fix TestSSL

2018-02-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377720#comment-16377720
 ] 

Sergey Shelukhin commented on HIVE-18796:
-

It looks like a test setup issue to me that the change exposed.
It tries to open connection to metastore that fails:
{noformat}
2018-02-25T16:05:43,903  INFO [main] metastore.HiveMetaStoreClient: Trying to 
connect to metastore with URI thrift://localhost:59057

org.apache.thrift.transport.TTransportException: 
javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
No name matching localhost found
at 
org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
 ~[libthrift-0.9.3.jar:0.9.3]
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73) 
~[libthrift-0.9.3.jar:0.9.3]
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) 
~[libthrift-0.9.3.jar:0.9.3]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_set_ugi(ThriftHiveMetastore.java:4591)
 ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

Let me see if it's easy to fix... otherwise I might just add a bypass for the 
test.

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18796) fix TestSSL

2018-02-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18796:

Status: Patch Available  (was: Open)

The patch

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18796.patch
>
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18796) fix TestSSL

2018-02-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18796:

Attachment: HIVE-18796.patch

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18796.patch
>
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-02-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377934#comment-16377934
 ] 

Sergey Shelukhin commented on HIVE-18192:
-

[~sankarh] [~ekoifman] I'm looking at some ACID stats improvements, and trying 
to get ACID state during an insert query.
When I get the config setting where txn state is stored as string for an insert 
query in autoColumnStats_4 (insert into acid table from non-acid table), the 
setting is "4" - basically it has txn ID, but no write Id for any table.
Is that by design? How does it write into the table without having a write ID 
for that table?

> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, 
> HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, 
> HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, 
> HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, 
> HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch, 
> HIVE-18192.15.patch, HIVE-18192.16.patch, HIVE-18192.17.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18571) stats issues for MM tables

2018-02-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18571:

Attachment: HIVE-18571.03.patch

> stats issues for MM tables
> --
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.03.patch, HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18571) stats issues for MM tables

2018-02-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377941#comment-16377941
 ] 

Sergey Shelukhin commented on HIVE-18571:
-

Update again to try and use ACID state in more locations when updating stats. 
Unfortunately it looks like write ID for the target table of an insert is 
missing during the insert; I left a comment in HIVE-18192: "When I get the 
config setting where txn state is stored as string for an insert query in 
autoColumnStats_4 (insert into acid table from non-acid table), the setting is 
"4" - basically it has txn ID, but no write Id for any table. Is that by 
design? How does it write into the table without having a write ID for that 
table?"

Need to clarify that and update the patch either with/after the write ID fix, 
or based on the next test run, update the ACID table stats to be cleared in 
more tests when ACID state cannot be obtained.

[~ekoifman] can you please review this patch?
I think write ID and/or some out files aside it is ready.

> stats issues for MM tables
> --
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.03.patch, HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377942#comment-16377942
 ] 

Sergey Shelukhin commented on HIVE-18738:
-

[~prasanth_j] ping?

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-02-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377989#comment-16377989
 ] 

Sergey Shelukhin commented on HIVE-18192:
-

Write IDs. I'm just calling the standard method to get ValidWriteIds object; it 
didn't have the table I need so I logged the string it uses, which was "4".
The query is
insert into table acid_dtt select cint, cast(cstring1 as varchar(128)) from 
alltypesorc where cint is not null order by cint limit 10;


> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, 
> HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, 
> HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, 
> HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, 
> HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch, 
> HIVE-18192.15.patch, HIVE-18192.16.patch, HIVE-18192.17.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-17937) llap_acid_fast test is flaky

2018-02-27 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17937:

Attachment: HIVE-17937.05.patch

> llap_acid_fast test is flaky
> 
>
> Key: HIVE-17937
> URL: https://issues.apache.org/jira/browse/HIVE-17937
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
>Priority: Blocker
> Attachments: HIVE-17937.05.patch, HIVE-17937.2.patch, 
> HIVE-17937.3.patch, HIVE-17937.4.patch, HIVE-17993.patch
>
>
> See for example 
> https://builds.apache.org/job/PreCommit-HIVE-Build/7521/testReport/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_llap_acid_fast_/history/
>  (the history link is the same from any build number with a test run, just 
> replace 7521 if this one expires).
> Looks like results change, which may not be good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17937) llap_acid_fast test is flaky

2018-02-27 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379351#comment-16379351
 ] 

Sergey Shelukhin commented on HIVE-17937:
-

I've rerun the tests locally and updated the patch. Query results are 
consistent between the two drivers in both tests.
Will commit on a good test run (of those two tests)

> llap_acid_fast test is flaky
> 
>
> Key: HIVE-17937
> URL: https://issues.apache.org/jira/browse/HIVE-17937
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
>Priority: Blocker
> Attachments: HIVE-17937.05.patch, HIVE-17937.2.patch, 
> HIVE-17937.3.patch, HIVE-17937.4.patch, HIVE-17993.patch
>
>
> See for example 
> https://builds.apache.org/job/PreCommit-HIVE-Build/7521/testReport/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_llap_acid_fast_/history/
>  (the history link is the same from any build number with a test run, just 
> replace 7521 if this one expires).
> Looks like results change, which may not be good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18796) fix TestSSL

2018-02-27 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379512#comment-16379512
 ] 

Sergey Shelukhin commented on HIVE-18796:
-

This fixes TestSSL. [~kgyrtkirk] can you review?

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18796.patch
>
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-02-27 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379515#comment-16379515
 ] 

Sergey Shelukhin commented on HIVE-18192:
-

[~sankarh] thanks! Is there a ticket #?

> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, 
> HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, 
> HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, 
> HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, 
> HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch, 
> HIVE-18192.15.patch, HIVE-18192.16.patch, HIVE-18192.17.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18820) Operation doesn't always clean up log4j for operation log

2018-02-27 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18820:

Attachment: HIVE-18820.patch

> Operation doesn't always clean up log4j for operation log
> -
>
> Key: HIVE-18820
> URL: https://issues.apache.org/jira/browse/HIVE-18820
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18820.patch
>
>
> Operation log may only be enabled for some operations, however the routing 
> appender still creates appenders for the queries where it's not enabled as 
> far as I see, that don't get cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18820) Operation doesn't always clean up log4j for operation log

2018-02-27 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18820:
---


> Operation doesn't always clean up log4j for operation log
> -
>
> Key: HIVE-18820
> URL: https://issues.apache.org/jira/browse/HIVE-18820
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18820.patch
>
>
> Operation log may only be enabled for some operations, however the routing 
> appender still creates appenders for the queries where it's not enabled as 
> far as I see, that don't get cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18820) Operation doesn't always clean up log4j for operation log

2018-02-27 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18820:

Status: Patch Available  (was: Open)

> Operation doesn't always clean up log4j for operation log
> -
>
> Key: HIVE-18820
> URL: https://issues.apache.org/jira/browse/HIVE-18820
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18820.patch
>
>
> Operation log may only be enabled for some operations, however the routing 
> appender still creates appenders for the queries where it's not enabled as 
> far as I see, that don't get cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18820) Operation doesn't always clean up log4j for operation log

2018-02-27 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379666#comment-16379666
 ] 

Sergey Shelukhin commented on HIVE-18820:
-

[~thejas] [~aihuaxu] can you take a look? 
Before this patch and adding a log line to the cleanup code, I see tons of 
appenders being created in memory dump, but only a few cleanup calls.

> Operation doesn't always clean up log4j for operation log
> -
>
> Key: HIVE-18820
> URL: https://issues.apache.org/jira/browse/HIVE-18820
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18820.patch
>
>
> Operation log may only be enabled for some operations, however the routing 
> appender still creates appenders for the queries where it's not enabled as 
> far as I see, that don't get cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380854#comment-16380854
 ] 

Sergey Shelukhin commented on HIVE-18824:
-

I may be misunderstanding writeID, but isn't it the ID of a write? So, every 
insert, ctas etc query should have a write ID because it writes to table?
So I wonder if the problem is just it's not included in the list.

> ValidWriteIdList config should be defined on tables which has to collect 
> stats after insert.
> 
>
> Key: HIVE-18824
> URL: https://issues.apache.org/jira/browse/HIVE-18824
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, isolation
> Fix For: 3.0.0
>
>
> In HIVE-18192 , per table write ID was introduced where snapshot isolation is 
> built using ValidWriteIdList on tables which are read with in a txn. 
> ReadEntity list is referred to decide which table is read within a txn.
> For insert operation, table will be found only in WriteEntity, but the table 
> is read to collect stats.
> So, it is needed to build the ValidWriteIdList for tables/partition part of 
> WriteEntity as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18796) fix TestSSL

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380872#comment-16380872
 ] 

Sergey Shelukhin commented on HIVE-18796:
-

Well, the test tests a particular type of a broken connection, the change just 
added the connection earlier that also failed.

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18796.patch
>
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18796) fix TestSSL

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18796:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18796.patch
>
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18796) fix TestSSL

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18796:

Fix Version/s: 3.0.0

> fix TestSSL
> ---
>
> Key: HIVE-18796
> URL: https://issues.apache.org/jira/browse/HIVE-18796
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Zoltan Haindrich
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18796.patch
>
>
> broken by HIVE-18203



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18820) Operation doesn't always clean up log4j for operation log

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18820:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> Operation doesn't always clean up log4j for operation log
> -
>
> Key: HIVE-18820
> URL: https://issues.apache.org/jira/browse/HIVE-18820
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18820.patch
>
>
> Operation log may only be enabled for some operations, however the routing 
> appender still creates appenders for the queries where it's not enabled as 
> far as I see, that don't get cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-17937) llap_acid_fast test is flaky

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17937:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the patch!

> llap_acid_fast test is flaky
> 
>
> Key: HIVE-17937
> URL: https://issues.apache.org/jira/browse/HIVE-17937
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-17937.05.patch, HIVE-17937.2.patch, 
> HIVE-17937.3.patch, HIVE-17937.4.patch, HIVE-17993.patch
>
>
> See for example 
> https://builds.apache.org/job/PreCommit-HIVE-Build/7521/testReport/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_llap_acid_fast_/history/
>  (the history link is the same from any build number with a test run, just 
> replace 7521 if this one expires).
> Looks like results change, which may not be good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18738:

Attachment: HIVE-18738.01.patch

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.01.patch, HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18738) LLAP IO ACID - includes handling is broken

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380995#comment-16380995
 ] 

Sergey Shelukhin commented on HIVE-18738:
-

Rebased the patch and undid some debugging changes that accidentally made it 
into the patch.
Will wait for a test run with a new patch to make sure this works given write 
ID changes

> LLAP IO ACID - includes handling is broken
> --
>
> Key: HIVE-18738
> URL: https://issues.apache.org/jira/browse/HIVE-18738
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepesh Khandelwal
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18738.01.patch, HIVE-18738.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18827) useless dynamic value exceptions strike back

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381043#comment-16381043
 ] 

Sergey Shelukhin commented on HIVE-18827:
-

cc [~jdere]

> useless dynamic value exceptions strike back
> 
>
> Key: HIVE-18827
> URL: https://issues.apache.org/jira/browse/HIVE-18827
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Looking at ~master, I can see tons of exceptions like this in LLAP log:
> {noformat}
> 2018-02-27T14:07:51,989  WARN [IO-Elevator-Thread-12 
> (1515669035295_0909_1_08_000117_0)] impl.RecordReaderImpl: 
> NoDynamicValuesException when evaluating predicate. Skipping ORC PPD. Stats: 
> numberOfValues: 9750
> intStatistics {
>   minimum: 11335
>   maximum: 560
>   sum: 27648854404
> }
> hasNull: true
>  Predicate: (BETWEEN ss_addr_sk 
> DynamicValue(RS_27_customer_address_ca_address_sk_min) 
> DynamicValue(RS_27_customer_address_ca_address_sk_max))
> org.apache.hadoop.hive.ql.plan.DynamicValue$NoDynamicValuesException: Value 
> does not exist in registry: RS_27_customer_address_ca_address_sk_min
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DynamicValueRegistryTez.getValue(DynamicValueRegistryTez.java:77)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:137) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getJavaValue(DynamicValue.java:97)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getLiteral(DynamicValue.java:93) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.getLiteralList(SearchArgumentImpl.java:120)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:553)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:423)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:848)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineRgsToRead(OrcEncodedDataReader.java:835)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:335)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:276)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_112]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>  ~[hadoop-common-3.0.0.3.0.0.0-776.jar:?]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> ~[tez-common-0.9.2-SNAPSHOT.jar:0.9.2-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18828) improve error handling for codecs in LLAP IO

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381109#comment-16381109
 ] 

Sergey Shelukhin commented on HIVE-18828:
-

A similar patch is needed for ORC.
[~gopalv] can you take a look?

> improve error handling for codecs in LLAP IO
> 
>
> Key: HIVE-18828
> URL: https://issues.apache.org/jira/browse/HIVE-18828
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18828.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18828) improve error handling for codecs in LLAP IO

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18828:
---

Assignee: Sergey Shelukhin

> improve error handling for codecs in LLAP IO
> 
>
> Key: HIVE-18828
> URL: https://issues.apache.org/jira/browse/HIVE-18828
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18828.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18828) improve error handling for codecs in LLAP IO

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18828:

Status: Patch Available  (was: Open)

> improve error handling for codecs in LLAP IO
> 
>
> Key: HIVE-18828
> URL: https://issues.apache.org/jira/browse/HIVE-18828
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18828.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18828) improve error handling for codecs in LLAP IO

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18828:

Attachment: HIVE-18828.patch

> improve error handling for codecs in LLAP IO
> 
>
> Key: HIVE-18828
> URL: https://issues.apache.org/jira/browse/HIVE-18828
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18828.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18827) useless dynamic value exceptions strike back

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381127#comment-16381127
 ] 

Sergey Shelukhin commented on HIVE-18827:
-

It doesn't fail the task as far as I see, just logs a lot of errors.

> useless dynamic value exceptions strike back
> 
>
> Key: HIVE-18827
> URL: https://issues.apache.org/jira/browse/HIVE-18827
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Jason Dere
>Priority: Major
>
> Looking at ~master, I can see tons of exceptions like this in LLAP log:
> {noformat}
> 2018-02-27T14:07:51,989  WARN [IO-Elevator-Thread-12 
> (1515669035295_0909_1_08_000117_0)] impl.RecordReaderImpl: 
> NoDynamicValuesException when evaluating predicate. Skipping ORC PPD. Stats: 
> numberOfValues: 9750
> intStatistics {
>   minimum: 11335
>   maximum: 560
>   sum: 27648854404
> }
> hasNull: true
>  Predicate: (BETWEEN ss_addr_sk 
> DynamicValue(RS_27_customer_address_ca_address_sk_min) 
> DynamicValue(RS_27_customer_address_ca_address_sk_max))
> org.apache.hadoop.hive.ql.plan.DynamicValue$NoDynamicValuesException: Value 
> does not exist in registry: RS_27_customer_address_ca_address_sk_min
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DynamicValueRegistryTez.getValue(DynamicValueRegistryTez.java:77)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:137) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getJavaValue(DynamicValue.java:97)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getLiteral(DynamicValue.java:93) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.getLiteralList(SearchArgumentImpl.java:120)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:553)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:423)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:848)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineRgsToRead(OrcEncodedDataReader.java:835)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:335)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:276)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_112]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>  ~[hadoop-common-3.0.0.3.0.0.0-776.jar:?]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> ~[tez-common-0.9.2-SNAPSHOT.jar:0.9.2-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18827) useless dynamic value exceptions strike back

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381128#comment-16381128
 ] 

Sergey Shelukhin commented on HIVE-18827:
-

I wonder if we should do both - the graceful handling when we can, but also 
change the error handling to not log the callstack? Doesn't seem to be very 
useful

> useless dynamic value exceptions strike back
> 
>
> Key: HIVE-18827
> URL: https://issues.apache.org/jira/browse/HIVE-18827
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Jason Dere
>Priority: Major
>
> Looking at ~master, I can see tons of exceptions like this in LLAP log:
> {noformat}
> 2018-02-27T14:07:51,989  WARN [IO-Elevator-Thread-12 
> (1515669035295_0909_1_08_000117_0)] impl.RecordReaderImpl: 
> NoDynamicValuesException when evaluating predicate. Skipping ORC PPD. Stats: 
> numberOfValues: 9750
> intStatistics {
>   minimum: 11335
>   maximum: 560
>   sum: 27648854404
> }
> hasNull: true
>  Predicate: (BETWEEN ss_addr_sk 
> DynamicValue(RS_27_customer_address_ca_address_sk_min) 
> DynamicValue(RS_27_customer_address_ca_address_sk_max))
> org.apache.hadoop.hive.ql.plan.DynamicValue$NoDynamicValuesException: Value 
> does not exist in registry: RS_27_customer_address_ca_address_sk_min
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DynamicValueRegistryTez.getValue(DynamicValueRegistryTez.java:77)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:137) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getJavaValue(DynamicValue.java:97)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getLiteral(DynamicValue.java:93) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.getLiteralList(SearchArgumentImpl.java:120)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:553)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:423)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:848)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineRgsToRead(OrcEncodedDataReader.java:835)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:335)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:276)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_112]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>  ~[hadoop-common-3.0.0.3.0.0.0-776.jar:?]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> ~[tez-common-0.9.2-SNAPSHOT.jar:0.9.2-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18827) useless dynamic value exceptions strike back

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381285#comment-16381285
 ] 

Sergey Shelukhin commented on HIVE-18827:
-

Should it just check the specific exception? We don't expect any other 
exception that would be normal as far as I can tell.

> useless dynamic value exceptions strike back
> 
>
> Key: HIVE-18827
> URL: https://issues.apache.org/jira/browse/HIVE-18827
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18827.1.patch
>
>
> Looking at ~master, I can see tons of exceptions like this in LLAP log:
> {noformat}
> 2018-02-27T14:07:51,989  WARN [IO-Elevator-Thread-12 
> (1515669035295_0909_1_08_000117_0)] impl.RecordReaderImpl: 
> NoDynamicValuesException when evaluating predicate. Skipping ORC PPD. Stats: 
> numberOfValues: 9750
> intStatistics {
>   minimum: 11335
>   maximum: 560
>   sum: 27648854404
> }
> hasNull: true
>  Predicate: (BETWEEN ss_addr_sk 
> DynamicValue(RS_27_customer_address_ca_address_sk_min) 
> DynamicValue(RS_27_customer_address_ca_address_sk_max))
> org.apache.hadoop.hive.ql.plan.DynamicValue$NoDynamicValuesException: Value 
> does not exist in registry: RS_27_customer_address_ca_address_sk_min
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DynamicValueRegistryTez.getValue(DynamicValueRegistryTez.java:77)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:137) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getJavaValue(DynamicValue.java:97)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getLiteral(DynamicValue.java:93) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.getLiteralList(SearchArgumentImpl.java:120)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateMinMax(RecordReaderImpl.java:553)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:463)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:423)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:848)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineRgsToRead(OrcEncodedDataReader.java:835)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:335)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:276)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_112]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>  ~[hadoop-common-3.0.0.3.0.0.0-776.jar:?]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:273)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> ~[tez-common-0.9.2-SNAPSHOT.jar:0.9.2-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>  ~[hive-llap-server-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18571) stats issues for MM tables

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381295#comment-16381295
 ] 

Sergey Shelukhin commented on HIVE-18571:
-

The patch is ready for review and I don't expect major changes (other that from 
CR feedback).
However, I'll wait for HIVE-18824 before committing, so that the code that uses 
ACID state to get stats would actually work.
It would be the code already in the patch, however currently it bails and sets 
stats to 0 because write IDs for tables being written are not there.

> stats issues for MM tables
> --
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.03.patch, HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18824:
---

Assignee: Sergey Shelukhin  (was: Sankar Hariappan)

> ValidWriteIdList config should be defined on tables which has to collect 
> stats after insert.
> 
>
> Key: HIVE-18824
> URL: https://issues.apache.org/jira/browse/HIVE-18824
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: ACID, isolation
> Fix For: 3.0.0
>
>
> In HIVE-18192 , per table write ID was introduced where snapshot isolation is 
> built using ValidWriteIdList on tables which are read with in a txn. 
> ReadEntity list is referred to decide which table is read within a txn.
> For insert operation, table will be found only in WriteEntity, but the table 
> is read to collect stats.
> So, it is needed to build the ValidWriteIdList for tables/partition part of 
> WriteEntity as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18824:

Status: Patch Available  (was: In Progress)

So should this be sufficient then? 
:)

> ValidWriteIdList config should be defined on tables which has to collect 
> stats after insert.
> 
>
> Key: HIVE-18824
> URL: https://issues.apache.org/jira/browse/HIVE-18824
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: ACID, isolation
> Fix For: 3.0.0
>
> Attachments: HIVE-18824.patch
>
>
> In HIVE-18192 , per table write ID was introduced where snapshot isolation is 
> built using ValidWriteIdList on tables which are read with in a txn. 
> ReadEntity list is referred to decide which table is read within a txn.
> For insert operation, table will be found only in WriteEntity, but the table 
> is read to collect stats.
> So, it is needed to build the ValidWriteIdList for tables/partition part of 
> WriteEntity as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18824:

Attachment: HIVE-18824.patch

> ValidWriteIdList config should be defined on tables which has to collect 
> stats after insert.
> 
>
> Key: HIVE-18824
> URL: https://issues.apache.org/jira/browse/HIVE-18824
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: ACID, isolation
> Fix For: 3.0.0
>
> Attachments: HIVE-18824.patch
>
>
> In HIVE-18192 , per table write ID was introduced where snapshot isolation is 
> built using ValidWriteIdList on tables which are read with in a txn. 
> ReadEntity list is referred to decide which table is read within a txn.
> For insert operation, table will be found only in WriteEntity, but the table 
> is read to collect stats.
> So, it is needed to build the ValidWriteIdList for tables/partition part of 
> WriteEntity as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18683) support ACID/MM subdirectories in LOAD

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18683:

Summary: support ACID/MM subdirectories in LOAD  (was: support 
subdirectories in LOAD)

> support ACID/MM subdirectories in LOAD
> --
>
> Key: HIVE-18683
> URL: https://issues.apache.org/jira/browse/HIVE-18683
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Currently, load does not support subdirectories.
> It should be possible to flatten the directory structure when loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HIVE-18683) support ACID/MM subdirectories in LOAD

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-18683.
-
Resolution: Won't Fix

Actually, upon considering the code and the tests, I don't think we should do 
it.
For MM table exports it would be safe but for ACID tables the result of a load 
would not be valid.
When ACID ExIM works, it should handle this case.
For Export+Load, we are going to say it's not a supported scenario. For flat 
tables, it's often possible to treat them as a directory, so it works by 
coincidence, as an unsupported case

> support ACID/MM subdirectories in LOAD
> --
>
> Key: HIVE-18683
> URL: https://issues.apache.org/jira/browse/HIVE-18683
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Currently, load does not support subdirectories.
> It should be possible to flatten the directory structure when loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18683) support ACID/MM subdirectories in LOAD

2018-02-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381390#comment-16381390
 ] 

Sergey Shelukhin edited comment on HIVE-18683 at 3/1/18 2:05 AM:
-

Actually, upon considering the code and the scenarios for tests, I don't think 
we should do it.
For MM table exports it would be safe but for ACID tables the result of a load 
would not be valid.
When ACID ExIM works, it should handle this case (i.e. to load an exported ACID 
table, import should be used).
Import being available, Export+Load doesn't make sense, we are going to say 
it's not a supported scenario. For flat tables, it's often possible to treat 
them as a directory, so it works by coincidence, as an unsupported case


was (Author: sershe):
Actually, upon considering the code and the tests, I don't think we should do 
it.
For MM table exports it would be safe but for ACID tables the result of a load 
would not be valid.
When ACID ExIM works, it should handle this case.
For Export+Load, we are going to say it's not a supported scenario. For flat 
tables, it's often possible to treat them as a directory, so it works by 
coincidence, as an unsupported case

> support ACID/MM subdirectories in LOAD
> --
>
> Key: HIVE-18683
> URL: https://issues.apache.org/jira/browse/HIVE-18683
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Currently, load does not support subdirectories.
> It should be possible to flatten the directory structure when loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-18763) VectorMapOperator should take into account partition->table serde conversion for all cases

2018-02-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18763:
---

Assignee: Sergey Shelukhin

> VectorMapOperator should take into account partition->table serde conversion 
> for all cases
> --
>
> Key: HIVE-18763
> URL: https://issues.apache.org/jira/browse/HIVE-18763
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> When table and partition schema differ, non-vectorized MapOperator does row 
> by row conversion from whatever is read to the table schema.
> VectorMapOperator is less consistent... it does the conversion as part of 
> populating VRBs in row/serde modes (used to read e.g. text row-by-row or 
> natively, and make VRBs); see  VectorDeserializeRow class convert... methods 
> for an example. However, the native VRB mode relies on ORC 
> ConvertTreeReader... stuff that lives in ORC, and so never converts anything 
> nside VMO.
> So, anything running in native VRB mode that is not the vanilla ORC reader 
> will produce data with incorrect schema if there were schema changes and 
> partitions are present  - there are two such cases right now, LLAP IO with 
> ORC or text data, and Parquet. 
> It's possible to extend ConvertTreeReader... stuff to LLAP IO ORC that 
> already uses TreeReader-s for everything; LLAP IO text and (non-LLAP) 
> Parquet, as well as any future users however will have to invent their own 
> conversion.
> Therefore, I think the best fix for this is to treat all inputs in VMO the 
> same and convert them by default, like the regular MapOperator; and make ORC 
> special mode an exception that allows it to bypass the conversion. 
> cc [~mmccline]
> Test case - varchar column length should be limited after alter table but it 
> isn't.
> {noformat}
> CREATE TABLE schema_evolution_data(insert_num int, boolean1 boolean, tinyint1 
> tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
> decimal(38,18), float1 float, double1 double, string1 varchar(50), string2 
> varchar(50), date1 date, timestamp1 timestamp, boolean_str string, 
> tinyint_str string, smallint_str string, int_str string, bigint_str string, 
> decimal_str string, float_str string, double_str string, date_str string, 
> timestamp_str string, filler string)
> row format delimited fields terminated by '|' stored as textfile;
> load data local inpath 
> '../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
> table schema_evolution_data;
> drop table if exists vsp;
> create table vsp(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> textfile;
> insert into table vsp partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp change column vs vs varchar(3);
> drop table if exists vsp_orc;
> create table vsp_orc(vs varchar(50)) partitioned by(s varchar(50)) stored as 
> orc;
> insert into table vsp_orc partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_orc change column vs vs varchar(3);
> drop table if exists vsp_parquet;
> create table vsp_parquet(vs varchar(50)) partitioned by(s varchar(50)) stored 
> as parquet;
> insert into table vsp_parquet partition(s='positive') select string1 from 
> schema_evolution_data;
> alter table vsp_parquet change column vs vs varchar(3);
> SET hive.llap.io.enabled=true;
> -- BAD results from all queries; parquet affected regardless of IO.
> select length(vs) from vsp; 
> select length(vs) from vsp_orc;
> select length(vs) from vsp_parquet;
> SET hive.llap.io.enabled=false;
> select length(vs) from vsp; -- ok
> select length(vs) from vsp_orc; -- ok
> select length(vs) from vsp_parquet; -- still bad
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 10191 matches

Mail list logo