[jira] [Assigned] (HIVE-24503) Optimize vector row serde to avoid type check at run time

2020-12-08 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24503:
--


> Optimize vector row serde to avoid type check at run time 
> --
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24503) Optimize vector row serde by avoiding type check at run time

2020-12-08 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24503:
---
Summary: Optimize vector row serde by avoiding type check at run time   
(was: Optimize vector row serde to avoid type check at run time )

> Optimize vector row serde by avoiding type check at run time 
> -
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-02 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24471:
--


> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used. If the hash table 
> size increases beyond configurable limit, data is flushed to disk and new 
> hash table is generated. If the reduction by hash table is less than min hash 
> aggregation reduction calculated during compile time, the map side 
> aggregation is converted to streaming mode. So if the first few batch of 
> records does not result into significant reduction, then the mode is switched 
> to streaming mode. This may have impact on performance, if the subsequent 
> batch of records have less number of distinct values. To mitigate this 
> situation, a combiner can be added to the map task after the keys are sorted. 
> This will make sure that the aggregation is done if possible and reduce the 
> data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24378) Leading and trailing spaces are not removed before decimal conversion

2020-11-16 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24378.

Resolution: Fixed

Committed to master. Thanks [~pgaref] for review.

> Leading and trailing spaces are not removed before decimal conversion
> -
>
> Key: HIVE-24378
> URL: https://issues.apache.org/jira/browse/HIVE-24378
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24378-1.patch, HIVE-24378.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The decimal conversion is not removing the extra spaces in some scenarios. 
> because of this the numbers are getting converted to null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24378) Leading and trailing spaces are not removed before decimal conversion

2020-11-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24378:
---
Attachment: HIVE-24378-1.patch

> Leading and trailing spaces are not removed before decimal conversion
> -
>
> Key: HIVE-24378
> URL: https://issues.apache.org/jira/browse/HIVE-24378
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24378-1.patch, HIVE-24378.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The decimal conversion is not removing the extra spaces in some scenarios. 
> because of this the numbers are getting converted to null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-13 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231395#comment-17231395
 ] 

mahesh kumar behera commented on HIVE-24373:


Committed to master. Thanks [~jcamachorodriguez] for review.

> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24373-explain-paln.txt, HIVE-24373.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24373.

Resolution: Fixed

> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24373-explain-paln.txt, HIVE-24373.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24378) Leading and trailing spaces are not removed before decimal conversion

2020-11-12 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24378:
---
Description: The decimal conversion is not removing the extra spaces in 
some scenarios. because of this the numbers are getting converted to null.  
(was: The decimal conversion is taking care of removing the extra spaces in 
some scenarios. because of this the numbers are getting converted to null.)

> Leading and trailing spaces are not removed before decimal conversion
> -
>
> Key: HIVE-24378
> URL: https://issues.apache.org/jira/browse/HIVE-24378
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24378.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The decimal conversion is not removing the extra spaces in some scenarios. 
> because of this the numbers are getting converted to null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24378) Leading and trailing spaces are not removed before decimal conversion

2020-11-12 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24378:
---
Attachment: HIVE-24378.patch

> Leading and trailing spaces are not removed before decimal conversion
> -
>
> Key: HIVE-24378
> URL: https://issues.apache.org/jira/browse/HIVE-24378
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24378.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The decimal conversion is taking care of removing the extra spaces in some 
> scenarios. because of this the numbers are getting converted to null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24378) Leading and trailing spaces are not removed before decimal conversion

2020-11-12 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24378:
--


> Leading and trailing spaces are not removed before decimal conversion
> -
>
> Key: HIVE-24378
> URL: https://issues.apache.org/jira/browse/HIVE-24378
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The decimal conversion is taking care of removing the extra spaces in some 
> scenarios. because of this the numbers are getting converted to null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-11 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24373:
---
Attachment: HIVE-24373.patch

> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-24373-explain-paln.txt, HIVE-24373.patch
>
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-11 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24373:
---
Attachment: HIVE-24373-explain-paln.txt

> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-24373-explain-paln.txt
>
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24373) Wrong predicate is pushed down for view with constant value projection.

2020-11-11 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24373:
--


> Wrong predicate is pushed down for view with constant value projection.
> ---
>
> Key: HIVE-24373
> URL: https://issues.apache.org/jira/browse/HIVE-24373
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For below query the predicate pushed down for one of the table scan is not 
> proper.
>  
> {code:java}
> set hive.explain.user=false;
> set hive.cbo.enable=false;
> set hive.optimize.ppd=true;DROP TABLE arc;
> CREATE table arc(`dt_from` string, `dt_to` string);
> CREATE table loc1(`dt_from` string, `dt_to` string);
> CREATE
>  VIEW view AS
>  SELECT
> '' as DT_FROM,
> uuid() as DT_TO
>  FROM
>loc1
>  UNION ALL
>  SELECT
> dt_from as DT_FROM,
> uuid() as DT_TO
>  FROM
>arc;
> EXPLAIN
> SELECT
>   dt_from, dt_to
> FROM
>   view
> WHERE
>   '2020'  between dt_from and dt_to;
> {code}
>  
> For table loc1,  DT_FROM is projected as '' so the predicate "predicate: 
> '2020' BETWEEN '' AND _col1 (type: boolean)" is proper. But for table 
> arc, the column is projected so the predicate should be "predicate: '2020' 
> BETWEEN _col0 (type: boolean) AND _col1 (type: boolean)".
> This is because the predicates are stored in a map for each expression. Here 
> the expression is "_col0". When the predicate is pushed down the union, the 
> same predicate is used for creating the filter expression. Later when 
> constant replacement is done, the first filter is overwriting the second one.
> So we should create a clone (as done at other places) before using the cached 
> predicate for filter. This way the overwrite can be avoided.   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24362) AST tree processing is suboptimal for tree with large number of nodes

2020-11-10 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24362:
---
Description: In hive the children information is stored as list of objects. 
During processing of the children of a node, the list of object is converted to 
list of Nodes. This can cause large compilation time if the number of children 
is large(300,000). The list of children can be cached in the AST node to avoid 
this re-computation. The caching part is already fixed as part of HIVE-24031, 
the allocation of array is fixed in this Jira.  (was: In hive the children 
information is stored as list of objects. During processing of the children of 
a node, the list of object is converted to list of Nodes. This can cause large 
compilation time if the number of children is large. The list of children can 
be cached in the AST node to avoid this re-computation. )

> AST tree processing is suboptimal for tree with large number of nodes
> -
>
> Key: HIVE-24362
> URL: https://issues.apache.org/jira/browse/HIVE-24362
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In hive the children information is stored as list of objects. During 
> processing of the children of a node, the list of object is converted to list 
> of Nodes. This can cause large compilation time if the number of children is 
> large(300,000). The list of children can be cached in the AST node to avoid 
> this re-computation. The caching part is already fixed as part of HIVE-24031, 
> the allocation of array is fixed in this Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24362) AST tree processing is suboptimal for tree with large number of nodes

2020-11-10 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24362.

Resolution: Fixed

Pushed to master. Thanks [~pgaref] for review.

> AST tree processing is suboptimal for tree with large number of nodes
> -
>
> Key: HIVE-24362
> URL: https://issues.apache.org/jira/browse/HIVE-24362
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In hive the children information is stored as list of objects. During 
> processing of the children of a node, the list of object is converted to list 
> of Nodes. This can cause large compilation time if the number of children is 
> large. The list of children can be cached in the AST node to avoid this 
> re-computation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24362) AST tree processing is suboptimal for tree with large number of nodes

2020-11-09 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24362:
--


> AST tree processing is suboptimal for tree with large number of nodes
> -
>
> Key: HIVE-24362
> URL: https://issues.apache.org/jira/browse/HIVE-24362
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> In hive the children information is stored as list of objects. During 
> processing of the children of a node, the list of object is converted to list 
> of Nodes. This can cause large compilation time if the number of children is 
> large. The list of children can be cached in the AST node to avoid this 
> re-computation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24284) NPE when parsing druid logs using Hive

2020-10-18 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24284:
--


> NPE when parsing druid logs using Hive
> --
>
> Key: HIVE-24284
> URL: https://issues.apache.org/jira/browse/HIVE-24284
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> As per current Sys-logger parser, its always expecting a valid proc id. But 
> as per RFC3164 and RFC5424, the proc id can be skipped.So hive should handled 
> it by using NILVALUE/empty string in case the proc id is null.
>  
> {code:java}
> Caused by: java.lang.NullPointerException: null
> at java.lang.String.(String.java:566)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.createEvent(SyslogParser.java:361)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.readEvent(SyslogParser.java:326)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogSerDe.deserialize(SyslogSerDe.java:95)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24198) Map side SMB join is producing wrong result

2020-09-26 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24198.

Release Note: Committed to master
  Resolution: Fixed

> Map side SMB join is producing wrong result
> ---
>
> Key: HIVE-24198
> URL: https://issues.apache.org/jira/browse/HIVE-24198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code:java}
>  CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS ;
>  CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
>  set hive.auto.convert.join=true;
>  set hive.optimize.bucketmapjoin = true;
>  set hive.optimize.bucketmapjoin.sortedmerge = true;
>  set hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>  set hive.auto.convert.sortmerge.join=true;
>  set hive.auto.convert.sortmerge.join.to.mapjoin=false;
>  set hive.auto.convert.join.noconditionaltask.size=1;
>  set hive.optimize.semijoin.conversion = false;
>  insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');
>  insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');{code}
>  
>  
> {code:java}
>  Select * from (select b.key as key, count as value from tbl1_n5 b where key 
> < 6 group by b.key) subq1 join (select a.key as key, a.value as value from 
> tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
>  
> The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 
> 0,0,0,2,4,5,5,5. The input format for sorted tables should be set to 
> BucketizedHiveInputFormat instead of HiveInputFormat. This is done only for 
> MapWork. But if the root task in a MapJoinWork, it is not handled. This is 
> causing the mapper to create splits more than the number of buckets and 
> resulting into extra records.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24198) Map side SMB join is producing wrong result

2020-09-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24198:
---
Description: 
{code:java}
 CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS ;
 CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;

 set hive.auto.convert.join=true;
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.auto.convert.sortmerge.join=true;
 set hive.auto.convert.sortmerge.join.to.mapjoin=false;
 set hive.auto.convert.join.noconditionaltask.ize=1;
 set hive.optimize.semijoin.conversion = false;

 insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');

 insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');{code}
 

 
{code:java}
 Select * from (select b.key as key, count as value from tbl1_n5 b where key < 
6 group by b.key) subq1 join (select a.key as key, a.value as value from 
tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
 

The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 0,0,0,2,4,5,5,5. 
The input format for sorted tables should be set to BucketizedHiveInputFormat 
instead of HiveInputFormat. This is done only for MapWork. But if the root task 
in a MapJoinWork, it is not handled. This is causing the mapper to create 
splits more than the number of buckets and resulting into extra records.

 

  was:
{code:java}
 CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS ;
 CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;

 set hive.auto.convert.join=true;
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.auto.convert.sortmerge.join=true;
 set hive.auto.convert.sortmerge.join.to.mapjoin=false;
 set hive.auto.convert.join.noconditionaltask.ize=1;
 set hive.optimize.semijoin.conversion = false;

 insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');

 insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');{code}
 

 
{code:java}
 Select * from (select b.key as key, count as value from tbl1_n5 b where key < 
6 group by b.key) subq1 join (select a.key as key, a.value as value from 
tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 0,0,0,2,4,5,5,5.


> Map side SMB join is producing wrong result
> ---
>
> Key: HIVE-24198
> URL: https://issues.apache.org/jira/browse/HIVE-24198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> {code:java}
>  CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS ;
>  CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
>  set hive.auto.convert.join=true;
>  set hive.optimize.bucketmapjoin = true;
>  set hive.optimize.bucketmapjoin.sortedmerge = true;
>  set hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>  set hive.auto.convert.sortmerge.join=true;
>  set hive.auto.convert.sortmerge.join.to.mapjoin=false;
>  set hive.auto.convert.join.noconditionaltask.ize=1;
>  set hive.optimize.semijoin.conversion = false;
>  insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');
>  insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');{code}
>  
>  
> {code:java}
>  Select * from (select b.key as key, count as value from tbl1_n5 b where key 
> < 6 group by b.key) subq1 join (select a.key as key, a.value as value from 
> tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
>  
> The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 
> 0,0,0,2,4,5,5,5. The input format for sorted tables should be set to 
> BucketizedHiveInputFormat instead of HiveInputFormat. This is done only for 
> MapWork. 

[jira] [Updated] (HIVE-24198) Map side SMB join is producing wrong result

2020-09-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24198:
---
Summary: Map side SMB join is producing wrong result  (was: Map side SMB 
join producing wrong result)

> Map side SMB join is producing wrong result
> ---
>
> Key: HIVE-24198
> URL: https://issues.apache.org/jira/browse/HIVE-24198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> {code:java}
>  CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS ;
>  CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
>  set hive.auto.convert.join=true;
>  set hive.optimize.bucketmapjoin = true;
>  set hive.optimize.bucketmapjoin.sortedmerge = true;
>  set hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>  set hive.auto.convert.sortmerge.join=true;
>  set hive.auto.convert.sortmerge.join.to.mapjoin=false;
>  set hive.auto.convert.join.noconditionaltask.ize=1;
>  set hive.optimize.semijoin.conversion = false;
>  insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');
>  insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');{code}
>  
>  
> {code:java}
>  Select * from (select b.key as key, count as value from tbl1_n5 b where key 
> < 6 group by b.key) subq1 join (select a.key as key, a.value as value from 
> tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
> The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 
> 0,0,0,2,4,5,5,5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24198) Map side SMB join producing wrong result

2020-09-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24198:
---
Summary: Map side SMB join producing wrong result  (was: Map side SMB join 
produceing wrong result)

> Map side SMB join producing wrong result
> 
>
> Key: HIVE-24198
> URL: https://issues.apache.org/jira/browse/HIVE-24198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> {code:java}
>  CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS ;
>  CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
>  set hive.auto.convert.join=true;
>  set hive.optimize.bucketmapjoin = true;
>  set hive.optimize.bucketmapjoin.sortedmerge = true;
>  set hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>  set hive.auto.convert.sortmerge.join=true;
>  set hive.auto.convert.sortmerge.join.to.mapjoin=false;
>  set hive.auto.convert.join.noconditionaltask.ize=1;
>  set hive.optimize.semijoin.conversion = false;
>  insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');
>  insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');{code}
>  
>  
> {code:java}
>  Select * from (select b.key as key, count as value from tbl1_n5 b where key 
> < 6 group by b.key) subq1 join (select a.key as key, a.value as value from 
> tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
> The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 
> 0,0,0,2,4,5,5,5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24198) Map side SMB join produceing wrong result

2020-09-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24198:
---
Description: 
{code:java}
 CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS ;
 CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;

 set hive.auto.convert.join=true;
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.auto.convert.sortmerge.join=true;
 set hive.auto.convert.sortmerge.join.to.mapjoin=false;
 set hive.auto.convert.join.noconditionaltask.ize=1;
 set hive.optimize.semijoin.conversion = false;

 insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');

 insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');{code}
 

 
{code:java}
 Select * from (select b.key as key, count as value from tbl1_n5 b where key < 
6 group by b.key) subq1 join (select a.key as key, a.value as value from 
tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 0,0,0,2,4,5,5,5.

  was:
CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS ;
CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;

set hive.auto.convert.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.sortmerge.join.to.mapjoin=false;
set hive.auto.convert.join.noconditionaltask.ize=1;

set hive.optimize.semijoin.conversion = false;


insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');


insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
'val_8'), (9, 'val_9');

 

Select * from (select b.key as key, count(*) as value from tbl1_n5 b where key 
< 6 group by b.key) subq1 join (select a.key as key, a.value as value from 
tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;

 

The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 0,0,0,2,4,5,5,5


> Map side SMB join produceing wrong result
> -
>
> Key: HIVE-24198
> URL: https://issues.apache.org/jira/browse/HIVE-24198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> {code:java}
>  CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS ;
>  CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
>  set hive.auto.convert.join=true;
>  set hive.optimize.bucketmapjoin = true;
>  set hive.optimize.bucketmapjoin.sortedmerge = true;
>  set hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>  set hive.auto.convert.sortmerge.join=true;
>  set hive.auto.convert.sortmerge.join.to.mapjoin=false;
>  set hive.auto.convert.join.noconditionaltask.ize=1;
>  set hive.optimize.semijoin.conversion = false;
>  insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');
>  insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');{code}
>  
>  
> {code:java}
>  Select * from (select b.key as key, count as value from tbl1_n5 b where key 
> < 6 group by b.key) subq1 join (select a.key as key, a.value as value from 
> tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;{code}
> The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 
> 0,0,0,2,4,5,5,5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24198) Map side SMB join produceing wrong result

2020-09-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24198:
--


> Map side SMB join produceing wrong result
> -
>
> Key: HIVE-24198
> URL: https://issues.apache.org/jira/browse/HIVE-24198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS ;
> CREATE TABLE tbl2_n4(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
> set hive.auto.convert.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.sortmerge.join.to.mapjoin=false;
> set hive.auto.convert.join.noconditionaltask.ize=1;
> set hive.optimize.semijoin.conversion = false;
> insert into tbl2_n4 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');
> insert into tbl1_n5 values (2, 'val_2'), (0, 'val_0'), (0, 'val_0'), (0, 
> 'val_0'), (4, 'val_4') ,(5, 'val_5') ,(5, 'val_5') , (5, 'val_5'), (8, 
> 'val_8'), (9, 'val_9');
>  
> Select * from (select b.key as key, count(*) as value from tbl1_n5 b where 
> key < 6 group by b.key) subq1 join (select a.key as key, a.value as value 
> from tbl2_n4 a where key < 6) subq2 on subq1.key = subq2.key;
>  
> The above select is producing 0,0,0,2,4,5,5,5,5,5,5 instead of 0,0,0,2,4,5,5,5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23981) Use task counter enum to get the approximate counter value

2020-09-02 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23981 started by mahesh kumar behera.
--
> Use task counter enum to get the approximate counter value
> --
>
> Key: HIVE-23981
> URL: https://issues.apache.org/jira/browse/HIVE-23981
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> The value for APPROXIMATE_INPUT_RECORDS should be obtained using the enum 
> name instead of static string. Once Tez release is done with the specific 
> information we should change it to 
> org.apache.tez.common.counters.TaskCounter.APPROXIMATE_INPUT_RECORDS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-23981) Use task counter enum to get the approximate counter value

2020-09-02 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reopened HIVE-23981:


> Use task counter enum to get the approximate counter value
> --
>
> Key: HIVE-23981
> URL: https://issues.apache.org/jira/browse/HIVE-23981
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> The value for APPROXIMATE_INPUT_RECORDS should be obtained using the enum 
> name instead of static string. Once Tez release is done with the specific 
> information we should change it to 
> org.apache.tez.common.counters.TaskCounter.APPROXIMATE_INPUT_RECORDS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23953) Use task counter information to compute keycount during hashtable loading

2020-09-02 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-23953.

Resolution: Fixed

> Use task counter information to compute keycount during hashtable loading
> -
>
> Key: HIVE-23953
> URL: https://issues.apache.org/jira/browse/HIVE-23953
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There are cases when compiler misestimates key count and this results in a 
> number of hashtable resizes during runtime.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTableLoader.java#L128]
> In such cases, it would be good to get "approximate_input_records" (TEZ-4207) 
> counter from upstream to compute the key count more accurately at runtime.
>  
>  * 
>  * 
> Options
> h4.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23981) Use task counter enum to get the approximate counter value

2020-08-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-23981.

Resolution: Fixed

> Use task counter enum to get the approximate counter value
> --
>
> Key: HIVE-23981
> URL: https://issues.apache.org/jira/browse/HIVE-23981
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> The value for APPROXIMATE_INPUT_RECORDS should be obtained using the enum 
> name instead of static string. Once Tez release is done with the specific 
> information we should change it to 
> org.apache.tez.common.counters.TaskCounter.APPROXIMATE_INPUT_RECORDS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24013) Move anti join conversion after join reordering rule

2020-08-06 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24013:
---
Description: The current anti join conversion is done before join 
re-ordering rule. As Anti join is not handled in join re-ordering, it is better 
to move the anti join conversion after join reordering is done.  (was: The 
current anti join conversion does not check for null filters on right side of 
join if it's within OR conditions. Only those filters separated by AND 
conditions are supported. For example queries like "select t1.fld from tbl1 t1 
left join tbl2 t2 on t1.fld = t2.fld where t2.fld is null or t2.fld1 is null" 
are not converted to anti join. )

> Move anti join conversion after join reordering rule
> 
>
> Key: HIVE-24013
> URL: https://issues.apache.org/jira/browse/HIVE-24013
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion is done before join re-ordering rule. As 
> Anti join is not handled in join re-ordering, it is better to move the anti 
> join conversion after join reordering is done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24013) Move anti join conversion after join reordering rule

2020-08-06 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24013:
--


> Move anti join conversion after join reordering rule
> 
>
> Key: HIVE-24013
> URL: https://issues.apache.org/jira/browse/HIVE-24013
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion does not check for null filters on right 
> side of join if it's within OR conditions. Only those filters separated by 
> AND conditions are supported. For example queries like "select t1.fld from 
> tbl1 t1 left join tbl2 t2 on t1.fld = t2.fld where t2.fld is null or t2.fld1 
> is null" are not converted to anti join. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23992) Support null filter within or clause for Anti Join

2020-08-04 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23992:
---
Description: The current anti join conversion does not check for null 
filters on right side of join if it's within OR conditions. Only those filters 
separated by AND conditions are supported. For example queries like "select 
t1.fld from tbl1 t1 left join tbl2 t2 on t1.fld = t2.fld where t2.fld is null 
or t2.fld1 is null" are not converted to anti join.   (was: The current anti 
join conversion does not support join condition which is always true. The 
queries like select * from tbl t1 where not exists (select 1 from t2) is not 
converted to anti join.)

> Support null filter within or clause for Anti Join
> --
>
> Key: HIVE-23992
> URL: https://issues.apache.org/jira/browse/HIVE-23992
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion does not check for null filters on right 
> side of join if it's within OR conditions. Only those filters separated by 
> AND conditions are supported. For example queries like "select t1.fld from 
> tbl1 t1 left join tbl2 t2 on t1.fld = t2.fld where t2.fld is null or t2.fld1 
> is null" are not converted to anti join. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23992) Support null filter within or clause for Anti Join

2020-08-04 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23992:
--


> Support null filter within or clause for Anti Join
> --
>
> Key: HIVE-23992
> URL: https://issues.apache.org/jira/browse/HIVE-23992
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion does not support join condition which is 
> always true. The queries like select * from tbl t1 where not exists (select 1 
> from t2) is not converted to anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23991) Support isAlwaysTrue for Anti Join

2020-08-04 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23991:
---
Description: The current anti join conversion does not support join 
condition which is always true. The queries like select * from tbl t1 where not 
exists (select 1 from t2) is not converted to anti join.  (was: The current 
anti join conversion does not support direct conversion of not-exists to anti 
join. The not exists sub query is converted first to left out join and then its 
converted to anti join. This may cause some of the optimization rule to be 
skipped.

 )

> Support isAlwaysTrue for Anti Join
> --
>
> Key: HIVE-23991
> URL: https://issues.apache.org/jira/browse/HIVE-23991
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion does not support join condition which is 
> always true. The queries like select * from tbl t1 where not exists (select 1 
> from t2) is not converted to anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23991) Support isAlwaysTrue for Anti Join

2020-08-04 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23991:
--


> Support isAlwaysTrue for Anti Join
> --
>
> Key: HIVE-23991
> URL: https://issues.apache.org/jira/browse/HIVE-23991
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion does not support direct conversion of 
> not-exists to anti join. The not exists sub query is converted first to left 
> out join and then its converted to anti join. This may cause some of the 
> optimization rule to be skipped.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23981) Use task counter enum to get the approximate counter value

2020-08-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23981:
--

Assignee: mahesh kumar behera

> Use task counter enum to get the approximate counter value
> --
>
> Key: HIVE-23981
> URL: https://issues.apache.org/jira/browse/HIVE-23981
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> The value for APPROXIMATE_INPUT_RECORDS should be obtained using the enum 
> name instead of static string. Once Tez release is done with the specific 
> information we should change it to 
> org.apache.tez.common.counters.TaskCounter.APPROXIMATE_INPUT_RECORDS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23953) Use task counter information to compute keycount during hashtable loading

2020-08-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23953:
--

Assignee: mahesh kumar behera

> Use task counter information to compute keycount during hashtable loading
> -
>
> Key: HIVE-23953
> URL: https://issues.apache.org/jira/browse/HIVE-23953
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are cases when compiler misestimates key count and this results in a 
> number of hashtable resizes during runtime.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTableLoader.java#L128]
> In such cases, it would be good to get "approximate_input_records" (TEZ-4207) 
> counter from upstream to compute the key count more accurately at runtime.
>  
>  * 
>  * 
> Options
> h4.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23981) Use task counter enum to get the approximate counter value

2020-08-03 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23981:
---
Description: The value for APPROXIMATE_INPUT_RECORDS should be obtained 
using the enum name instead of static string. Once Tez release is done with the 
specific information we should change it to 
org.apache.tez.common.counters.TaskCounter.APPROXIMATE_INPUT_RECORDS.  (was: 
There are cases when compiler misestimates key count and this results in a 
number of hashtable resizes during runtime.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTableLoader.java#L128]

In such cases, it would be good to get "approximate_input_records" (TEZ-4207) 
counter from upstream to compute the key count more accurately at runtime.

 
 * 
 * 
Options
h4.  )

> Use task counter enum to get the approximate counter value
> --
>
> Key: HIVE-23981
> URL: https://issues.apache.org/jira/browse/HIVE-23981
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> The value for APPROXIMATE_INPUT_RECORDS should be obtained using the enum 
> name instead of static string. Once Tez release is done with the specific 
> information we should change it to 
> org.apache.tez.common.counters.TaskCounter.APPROXIMATE_INPUT_RECORDS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23933) Add getRowCountInt and getJoinDistinctRowCount support for anti join in calcite.

2020-07-26 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23933:
---
Summary: Add getRowCountInt and getJoinDistinctRowCount support for  anti 
join in calcite.   (was: Add getRowCountInt support for  anti join in calcite. )

> Add getRowCountInt and getJoinDistinctRowCount support for  anti join in 
> calcite. 
> --
>
> Key: HIVE-23933
> URL: https://issues.apache.org/jira/browse/HIVE-23933
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Current calcite 21 does not support getRowCountInt for anti join.The 
> selectivity calculation for anti join should be different than semi join. It 
> should be 1-semi join selectivity.
> Need to handle getJoinDistinctRowCount also in calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23933) Add getRowCountInt support for anti join in calcite.

2020-07-26 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23933:
---
Description: 
Current calcite 21 does not support getRowCountInt for anti join.The 
selectivity calculation for anti join should be different than semi join. It 
should be 1-semi join selectivity.

Need to handle getJoinDistinctRowCount also in calcite.

  was:Current calcite 21 does not support getRowCountInt for anti join.The 
selectivity calculation for anti join should be different than semi join. It 
should be 1-semi join selectivity.


> Add getRowCountInt support for  anti join in calcite. 
> --
>
> Key: HIVE-23933
> URL: https://issues.apache.org/jira/browse/HIVE-23933
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Current calcite 21 does not support getRowCountInt for anti join.The 
> selectivity calculation for anti join should be different than semi join. It 
> should be 1-semi join selectivity.
> Need to handle getJoinDistinctRowCount also in calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23933) Add getRowCountInt support for anti join in calcite.

2020-07-26 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23933:
---
Description: Current calcite 21 does not support getRowCountInt for anti 
join.The selectivity calculation for anti join should be different than semi 
join. It should be 1-semi join selectivity.  (was: The current anti join 
conversion does not support direct conversion of not-exists to anti join. The 
not exists sub query is converted first to left out join and then its converted 
to anti join. This may cause some of the optimization rule to be skipped.

 )

> Add getRowCountInt support for  anti join in calcite. 
> --
>
> Key: HIVE-23933
> URL: https://issues.apache.org/jira/browse/HIVE-23933
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Current calcite 21 does not support getRowCountInt for anti join.The 
> selectivity calculation for anti join should be different than semi join. It 
> should be 1-semi join selectivity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23933) Add getRowCountInt support for anti join in calcite.

2020-07-26 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23933:
--


> Add getRowCountInt support for  anti join in calcite. 
> --
>
> Key: HIVE-23933
> URL: https://issues.apache.org/jira/browse/HIVE-23933
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion does not support direct conversion of 
> not-exists to anti join. The not exists sub query is converted first to left 
> out join and then its converted to anti join. This may cause some of the 
> optimization rule to be skipped.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23928) Support conversion of not-exists to Anti join directly

2020-07-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23928:
---
Description: 
The current anti join conversion does not support direct conversion of 
not-exists to anti join. The not exists sub query is converted first to left 
out join and then its converted to anti join. This may cause some of the 
optimization rule to be skipped.

 

  was:
Support HiveJoinProjectTransposeRule for Anti Join

 


> Support conversion of not-exists to Anti join directly
> --
>
> Key: HIVE-23928
> URL: https://issues.apache.org/jira/browse/HIVE-23928
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The current anti join conversion does not support direct conversion of 
> not-exists to anti join. The not exists sub query is converted first to left 
> out join and then its converted to anti join. This may cause some of the 
> optimization rule to be skipped.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23928) Support conversion of not-exists to Anti join directly

2020-07-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23928:
--


> Support conversion of not-exists to Anti join directly
> --
>
> Key: HIVE-23928
> URL: https://issues.apache.org/jira/browse/HIVE-23928
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Support HiveJoinProjectTransposeRule for Anti Join
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23921) Support HiveJoinProjectTransposeRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23921:
---
Description: 
Support HiveJoinProjectTransposeRule for Anti Join

 

  was:
 If we have a PK-FK join that is only appending columns to the FK side, it 
basically means it is not filtering anything (everything is matching). If that 
is the case, then ANTIJOIN result would be empty. We could detect this at 
planning time and trigger the rewriting.

 


> Support HiveJoinProjectTransposeRule for Anti Join
> --
>
> Key: HIVE-23921
> URL: https://issues.apache.org/jira/browse/HIVE-23921
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Support HiveJoinProjectTransposeRule for Anti Join
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23921) Support HiveJoinProjectTransposeRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23921:
--


> Support HiveJoinProjectTransposeRule for Anti Join
> --
>
> Key: HIVE-23921
> URL: https://issues.apache.org/jira/browse/HIVE-23921
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
>  If we have a PK-FK join that is only appending columns to the FK side, it 
> basically means it is not filtering anything (everything is matching). If 
> that is the case, then ANTIJOIN result would be empty. We could detect this 
> at planning time and trigger the rewriting.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23920) Need to handle HiveJoinConstraintsRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23920:
--


> Need to handle HiveJoinConstraintsRule for Anti Join
> 
>
> Key: HIVE-23920
> URL: https://issues.apache.org/jira/browse/HIVE-23920
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Currently in Hive we create different operator for different kind of join. n 
> Calcite, it all seems to be based on a single Join class in newer releases. 
> So the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23920) Need to handle HiveJoinConstraintsRule for Anti Join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23920:
---
Description: 
 If we have a PK-FK join that is only appending columns to the FK side, it 
basically means it is not filtering anything (everything is matching). If that 
is the case, then ANTIJOIN result would be empty. We could detect this at 
planning time and trigger the rewriting.

 

  was:
Currently in Hive we create different operator for different kind of join. n 
Calcite, it all seems to be based on a single Join class in newer releases. So 
the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.

 


> Need to handle HiveJoinConstraintsRule for Anti Join
> 
>
> Key: HIVE-23920
> URL: https://issues.apache.org/jira/browse/HIVE-23920
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
>  If we have a PK-FK join that is only appending columns to the FK side, it 
> basically means it is not filtering anything (everything is matching). If 
> that is the case, then ANTIJOIN result would be empty. We could detect this 
> at planning time and trigger the rewriting.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23919) Merge all kind of Join operator variants (Semi, Anti, Normal) into one.

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23919:
---
Description: 
Currently in Hive we create different operator for different kind of join. n 
Calcite, it all seems to be based on a single Join class in newer releases. So 
the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.

 

  was:
For Anti Join, we emit the records if the join condition does not satisfies. In 
case of PK-FK rule we have to explore if this can be exploited to speed up Anti 
Join processing.

 


> Merge all kind of Join operator variants (Semi, Anti, Normal) into one. 
> 
>
> Key: HIVE-23919
> URL: https://issues.apache.org/jira/browse/HIVE-23919
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Currently in Hive we create different operator for different kind of join. n 
> Calcite, it all seems to be based on a single Join class in newer releases. 
> So the classes like HiveAntiJoin, HiveSemiJoin can be merged into one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23919) Merge all kind of Join operator variants (Semi, Anti, Normal) into one.

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23919:
--


> Merge all kind of Join operator variants (Semi, Anti, Normal) into one. 
> 
>
> Key: HIVE-23919
> URL: https://issues.apache.org/jira/browse/HIVE-23919
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For Anti Join, we emit the records if the join condition does not satisfies. 
> In case of PK-FK rule we have to explore if this can be exploited to speed up 
> Anti Join processing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23907) Hash table type should be considered for calculating the Map join table size

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23907:
---
Description: 
For some join like Anti join and Semi join , hash set is used instead of a hash 
table. This is done as these joins do not emit the right side columns and just 
an existence check is enough for join.  When we check for the  table size , 
during map join conversion , this info is not considered. The hash table size 
for these join will be considerably small and thus hash table for bigger table 
can fit into memory.

 

  was:
For Anti Join, we emit the records if the join condition does not satisfies. In 
case of PK-FK rule we have to explore if this can be exploited to speed up Anti 
Join processing.

 


> Hash table type should be considered for calculating the Map join table size
> 
>
> Key: HIVE-23907
> URL: https://issues.apache.org/jira/browse/HIVE-23907
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For some join like Anti join and Semi join , hash set is used instead of a 
> hash table. This is done as these joins do not emit the right side columns 
> and just an existence check is enough for join.  When we check for the  table 
> size , during map join conversion , this info is not considered. The hash 
> table size for these join will be considerably small and thus hash table for 
> bigger table can fit into memory.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23907) Hash table type should be considered for calculating the Map join table size

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23907:
--


> Hash table type should be considered for calculating the Map join table size
> 
>
> Key: HIVE-23907
> URL: https://issues.apache.org/jira/browse/HIVE-23907
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For Anti Join, we emit the records if the join condition does not satisfies. 
> In case of PK-FK rule we have to explore if this can be exploited to speed up 
> Anti Join processing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23906) Analyze and implement PK-FK based optimization for Anti join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23906:
---
Description: 
For Anti Join, we emit the records if the join condition does not satisfies. In 
case of PK-FK rule we have to explore if this can be exploited to speed up Anti 
Join processing.

 

  was:
Currently hive does not support Anti join. The query for anti join is converted 
to left outer join and null filter on right side join key is added to get the 
desired result. This is causing
 # Extra computation — The left outer join projects the redundant columns from 
right side. Along with that, filtering is done to remove the redundant rows. 
This is can be avoided in case of anti join as anti join will project only the 
required columns and rows from the left side table.
 # Extra shuffle — In case of anti join the duplicate records moved to join 
node can be avoided from the child node. This can reduce significant amount of 
data movement if the number of distinct rows( join keys) is significant.
 # Extra Memory Usage - In case of map based anti join , hash set is sufficient 
as just the key is required to check  if the records matches the join 
condition. In case of left join, we need the key and the non key columns also 
and thus a hash table will be required.

For a query like
{code:java}
 select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
The number of distinct ws_order_number in web_sales table in a typical 10TB 
TPCDS set up is just 10% of total records. So when we convert this query to 
anti join, instead of 7 billion rows, only 600 million rows are moved to join 
node.

In the current patch, just one conversion is done. The pattern of 
project->filter->left-join is converted to project->anti-join. This will take 
care of sub queries with “not exists” clause. The queries with “not exists” are 
converted first to filter + left-join and then its converted to anti join. The 
queries with “not in” are not handled in the current patch.

>From execution side, both merge join and map join with vectorized execution  
>is supported for anti join.


> Analyze and implement PK-FK based optimization for Anti join
> 
>
> Key: HIVE-23906
> URL: https://issues.apache.org/jira/browse/HIVE-23906
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> For Anti Join, we emit the records if the join condition does not satisfies. 
> In case of PK-FK rule we have to explore if this can be exploited to speed up 
> Anti Join processing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23906) Analyze and implement PK-FK based optimization for Anti join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23906:
---
Labels:   (was: pull-request-available)

> Analyze and implement PK-FK based optimization for Anti join
> 
>
> Key: HIVE-23906
> URL: https://issues.apache.org/jira/browse/HIVE-23906
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> For Anti Join, we emit the records if the join condition does not satisfies. 
> In case of PK-FK rule we have to explore if this can be exploited to speed up 
> Anti Join processing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23906) Analyze and implement PK-FK based optimization for Anti join

2020-07-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23906:
--


> Analyze and implement PK-FK based optimization for Anti join
> 
>
> Key: HIVE-23906
> URL: https://issues.apache.org/jira/browse/HIVE-23906
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23905) Remove duplicate code in vector map join execution for Anti join and Semi Join.

2020-07-22 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23905:
---
Description: The execution of anti join and semi join is exactly same for 
vector operators. The code can be merged to one and the specialized code can be 
extracted out based on the join type.  (was: 
[TestMapJoinOperator.java|https://github.com/apache/hive/pull/1147/files/ee4390223caf1816ba6c07c1245876dc3c99d1e9#diff-a96ed41dcf0566f31b90b5ac75fbf20b]
 should be updated to add test cases related to anti join.)

> Remove duplicate code in vector map join execution for Anti join and Semi 
> Join.
> ---
>
> Key: HIVE-23905
> URL: https://issues.apache.org/jira/browse/HIVE-23905
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The execution of anti join and semi join is exactly same for vector 
> operators. The code can be merged to one and the specialized code can be 
> extracted out based on the join type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23905) Remove duplicate code in vector map join execution for Anti join and Semi Join.

2020-07-22 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23905:
--


> Remove duplicate code in vector map join execution for Anti join and Semi 
> Join.
> ---
>
> Key: HIVE-23905
> URL: https://issues.apache.org/jira/browse/HIVE-23905
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> [TestMapJoinOperator.java|https://github.com/apache/hive/pull/1147/files/ee4390223caf1816ba6c07c1245876dc3c99d1e9#diff-a96ed41dcf0566f31b90b5ac75fbf20b]
>  should be updated to add test cases related to anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23904) Update TestMapJoinOperator for adding anti join test cases.

2020-07-22 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23904:
---
Description: 
[TestMapJoinOperator.java|https://github.com/apache/hive/pull/1147/files/ee4390223caf1816ba6c07c1245876dc3c99d1e9#diff-a96ed41dcf0566f31b90b5ac75fbf20b]
 should be updated to add test cases related to anti join.  (was: In case of 
anti join, bloom filter can be created on left side also ("IN (keylist right 
table)").But the filter should be "not-in" ("NOT IN (keylist right table)") as 
we want to select the records from left side which are not present in the right 
side. But it may cause wrong result as bloom filter may have false positive and 
thus simply adding not is not correct, special handling is required for "NOT 
IN".

[https://github.com/jmhodges/opposite_of_a_bloom_filter/])

> Update TestMapJoinOperator for adding anti join test cases.
> ---
>
> Key: HIVE-23904
> URL: https://issues.apache.org/jira/browse/HIVE-23904
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> [TestMapJoinOperator.java|https://github.com/apache/hive/pull/1147/files/ee4390223caf1816ba6c07c1245876dc3c99d1e9#diff-a96ed41dcf0566f31b90b5ac75fbf20b]
>  should be updated to add test cases related to anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23904) Update TestMapJoinOperator for adding anti join test cases.

2020-07-22 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23904:
--


> Update TestMapJoinOperator for adding anti join test cases.
> ---
>
> Key: HIVE-23904
> URL: https://issues.apache.org/jira/browse/HIVE-23904
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> In case of anti join, bloom filter can be created on left side also ("IN 
> (keylist right table)").But the filter should be "not-in" ("NOT IN (keylist 
> right table)") as we want to select the records from left side which are not 
> present in the right side. But it may cause wrong result as bloom filter may 
> have false positive and thus simply adding not is not correct, special 
> handling is required for "NOT IN".
> [https://github.com/jmhodges/opposite_of_a_bloom_filter/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23903) Support "not-in" for bloom filter

2020-07-22 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23903:
---
Description: 
In case of anti join, bloom filter can be created on left side also ("IN 
(keylist right table)").But the filter should be "not-in" ("NOT IN (keylist 
right table)") as we want to select the records from left side which are not 
present in the right side. But it may cause wrong result as bloom filter may 
have false positive and thus simply adding not is not correct, special handling 
is required for "NOT IN".

[https://github.com/jmhodges/opposite_of_a_bloom_filter/]

  was:
Currently hive does not support Anti join. The query for anti join is converted 
to left outer join and null filter on right side join key is added to get the 
desired result. This is causing
 # Extra computation — The left outer join projects the redundant columns from 
right side. Along with that, filtering is done to remove the redundant rows. 
This is can be avoided in case of anti join as anti join will project only the 
required columns and rows from the left side table.
 # Extra shuffle — In case of anti join the duplicate records moved to join 
node can be avoided from the child node. This can reduce significant amount of 
data movement if the number of distinct rows( join keys) is significant.
 # Extra Memory Usage - In case of map based anti join , hash set is sufficient 
as just the key is required to check  if the records matches the join 
condition. In case of left join, we need the key and the non key columns also 
and thus a hash table will be required.

For a query like
{code:java}
 select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
The number of distinct ws_order_number in web_sales table in a typical 10TB 
TPCDS set up is just 10% of total records. So when we convert this query to 
anti join, instead of 7 billion rows, only 600 million rows are moved to join 
node.

In the current patch, just one conversion is done. The pattern of 
project->filter->left-join is converted to project->anti-join. This will take 
care of sub queries with “not exists” clause. The queries with “not exists” are 
converted first to filter + left-join and then its converted to anti join. The 
queries with “not in” are not handled in the current patch.

>From execution side, both merge join and map join with vectorized execution  
>is supported for anti join.


> Support "not-in" for bloom filter
> -
>
> Key: HIVE-23903
> URL: https://issues.apache.org/jira/browse/HIVE-23903
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> In case of anti join, bloom filter can be created on left side also ("IN 
> (keylist right table)").But the filter should be "not-in" ("NOT IN (keylist 
> right table)") as we want to select the records from left side which are not 
> present in the right side. But it may cause wrong result as bloom filter may 
> have false positive and thus simply adding not is not correct, special 
> handling is required for "NOT IN".
> [https://github.com/jmhodges/opposite_of_a_bloom_filter/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23903) Support "not-in" for bloom filter

2020-07-22 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23903:
---
Labels:   (was: pull-request-available)

> Support "not-in" for bloom filter
> -
>
> Key: HIVE-23903
> URL: https://issues.apache.org/jira/browse/HIVE-23903
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> In case of anti join, bloom filter can be created on left side also ("IN 
> (keylist right table)").But the filter should be "not-in" ("NOT IN (keylist 
> right table)") as we want to select the records from left side which are not 
> present in the right side. But it may cause wrong result as bloom filter may 
> have false positive and thus simply adding not is not correct, special 
> handling is required for "NOT IN".
> [https://github.com/jmhodges/opposite_of_a_bloom_filter/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23903) Support "not-in" for bloom filter

2020-07-22 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23903:
--


> Support "not-in" for bloom filter
> -
>
> Key: HIVE-23903
> URL: https://issues.apache.org/jira/browse/HIVE-23903
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23716) Support Anti Join in Hive

2020-06-18 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23716:
---
Status: Patch Available  (was: Open)

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23716) Support Anti Join in Hive

2020-06-18 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23716:
---
Attachment: HIVE-23716.01.patch

> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23716.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23716) Support Anti Join in Hive

2020-06-17 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-23716:
--


> Support Anti Join in Hive 
> --
>
> Key: HIVE-23716
> URL: https://issues.apache.org/jira/browse/HIVE-23716
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Currently hive does not support Anti join. The query for anti join is 
> converted to left outer join and null filter on right side join key is added 
> to get the desired result. This is causing
>  # Extra computation — The left outer join projects the redundant columns 
> from right side. Along with that, filtering is done to remove the redundant 
> rows. This is can be avoided in case of anti join as anti join will project 
> only the required columns and rows from the left side table.
>  # Extra shuffle — In case of anti join the duplicate records moved to join 
> node can be avoided from the child node. This can reduce significant amount 
> of data movement if the number of distinct rows( join keys) is significant.
>  # Extra Memory Usage - In case of map based anti join , hash set is 
> sufficient as just the key is required to check  if the records matches the 
> join condition. In case of left join, we need the key and the non key columns 
> also and thus a hash table will be required.
> For a query like
> {code:java}
>  select wr_order_number FROM web_returns LEFT JOIN web_sales  ON 
> wr_order_number = ws_order_number WHERE ws_order_number IS NULL;{code}
> The number of distinct ws_order_number in web_sales table in a typical 10TB 
> TPCDS set up is just 10% of total records. So when we convert this query to 
> anti join, instead of 7 billion rows, only 600 million rows are moved to join 
> node.
> In the current patch, just one conversion is done. The pattern of 
> project->filter->left-join is converted to project->anti-join. This will take 
> care of sub queries with “not exists” clause. The queries with “not exists” 
> are converted first to filter + left-join and then its converted to anti 
> join. The queries with “not in” are not handled in the current patch.
> From execution side, both merge join and map join with vectorized execution  
> is supported for anti join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23561) FIX Arrow Decimal serialization for native VectorRowBatches

2020-06-09 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129119#comment-17129119
 ] 

mahesh kumar behera commented on HIVE-23561:


+1 

Looks good to me.

> FIX Arrow Decimal serialization for native VectorRowBatches
> ---
>
> Key: HIVE-23561
> URL: https://issues.apache.org/jira/browse/HIVE-23561
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: HIVE-23561.01.patch
>
>
> Arrow Serializer does not properly handle Decimal primitive values when 
> selected array is used.
> In more detail, decimalValueSetter should be setting the value at 
> *arrowIndex[i]* as the value at *hiveIndex[j]*, however currently its using 
> the _same_ index!
> https://github.com/apache/hive/blob/eac25e711ea750bc52f41da7ed3c32bfe36d4f67/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java#L926
> This works fine for cases where i == j (selected is not used) but returns 
> wrong decimal row values when i != j.
> This ticket fixes this inconsistency and adds tests with selected indexes for 
> all supported types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23123) Disable export/import of views and materialized views

2020-04-17 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085578#comment-17085578
 ] 

mahesh kumar behera commented on HIVE-23123:


+1 

[^HIVE-23123.03.patch] looks good to me.

> Disable export/import of views and materialized views
> -
>
> Key: HIVE-23123
> URL: https://issues.apache.org/jira/browse/HIVE-23123
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23123.01.patch, HIVE-23123.02.patch, 
> HIVE-23123.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport]
>  import and export can be done by using the
> {code:java}
> export table ...
> import table ... 
> {code}
> commands. The document doesn't mention views or materialized views at all, 
> and in fact we don't support commands like
> {code:java}
> export view ...
> import view ...
> export materialized view ...
> import materialized view ... 
> {code}
> they can not be parsed at all. The word table is often used though in a 
> broader sense, when it means all table like entities, including views and 
> materialized views. For example the various Table classes may represent any 
> of these as well.
> If I try to export a view with the export table ... command, it goes fine. A 
> _metadata file will be created, but no data directory, which is what we'd 
> expect. If I try to import it back, an exception is thrown due to the lack of 
> the data dir:
> {code:java}
> java.lang.AssertionError: null==getPath() for exim_view
>  at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3088)
>  at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:419)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:364)
>  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:335)
>  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:722)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:485) 
> {code}
> Still the view gets imported successfully, as data movement wasn't even 
> necessary.
> If we try to export a materialized view which is transactional, then this 
> exception occurs:
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found 
> exim_materialized_view_da21d41a_9fe4_4446_9c72_d251496abf9d
>  at 
> org.apache.hadoop.hive.ql.parse.AcidExportSemanticAnalyzer.analyzeAcidExport(AcidExportSemanticAnalyzer.java:163)
>  at 
> org.apache.hadoop.hive.ql.parse.AcidExportSemanticAnalyzer.analyze(AcidExportSemanticAnalyzer.java:71)
>  at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
>  at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>  at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:183)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:601)
>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:547)
>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:541) 
> {code}
> So the export process can not handle it, as the temporary table is not 
> getting created.
>  
> The import command handling have a lot of codes dedicated to importing views 
> and materialized views, which suggests that we support the importing (and 
> thus also suggests implicitly that we support the exporting) of views and 
> materialiezed views.
>  
> So the conclusion is that we have to decide if we support exporting/importing 
> of views and materialized views.
> If we decide not to support them then:
>  - export process should throw an exception if a view or materialized view is 
> the subject
>  - the codes specific to view imports should be removed
> If we decide to support them, then:
>  - the commands mentioned above above should be introduced
>  - exception should be thrown if not the proper command used (e.g. export 
> view on a table)
>  - the exceptions mentioned above should be fixed
>  
> I prefer not to support them, I don't think we should support the exporting / 
> importing of views. 

[jira] [Updated] (HIVE-23173) User login success/failed attempts should be logged

2020-04-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23173:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-23173.3.patch] committed to master. Thanks [~nareshpr] for the fix.

> User login success/failed attempts should be logged
> ---
>
> Key: HIVE-23173
> URL: https://issues.apache.org/jira/browse/HIVE-23173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Minor
> Attachments: HIVE-23173.1.patch, HIVE-23173.2.patch, 
> HIVE-23173.3.patch
>
>
> User login success & failure attempts should be logged in server logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23173) User login success/failed attempts should be logged

2020-04-13 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082147#comment-17082147
 ] 

mahesh kumar behera commented on HIVE-23173:


why the test code changes are done ?

> User login success/failed attempts should be logged
> ---
>
> Key: HIVE-23173
> URL: https://issues.apache.org/jira/browse/HIVE-23173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Minor
> Attachments: HIVE-23173.1.patch, HIVE-23173.2.patch, 
> HIVE-23173.3.patch
>
>
> User login success & failure attempts should be logged in server logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Status: Patch Available  (was: Open)

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch, HIVE-23060.02.patch, 
> HIVE-23060.03.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Attachment: HIVE-23060.03.patch

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch, HIVE-23060.02.patch, 
> HIVE-23060.03.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-24 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Status: Open  (was: Patch Available)

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch, HIVE-23060.02.patch, 
> HIVE-23060.03.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Attachment: HIVE-23060.02.patch

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch, HIVE-23060.02.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Status: Patch Available  (was: Open)

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch, HIVE-23060.02.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Status: Open  (was: Patch Available)

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Attachment: HIVE-23060.01.patch

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23060) Query failing with error "Grouping sets expression is not in GROUP BY key. Error encountered near token"

2020-03-23 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23060:
---
Status: Patch Available  (was: Open)

> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> 
>
> Key: HIVE-23060
> URL: https://issues.apache.org/jira/browse/HIVE-23060
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Luis E Martinez-Poblete
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-23060.01.patch
>
>
> Synopsis:
> =
> Query failing with error "Grouping sets expression is not in GROUP BY key. 
> Error encountered near token"
> Problem:
> 
> A Hive query in a view which fails with the following error:
> Error while compiling statement: FAILED: SemanticException 35:21 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'l0_equities_region_id'
> Reproduction case:
> {noformat}
> create database test; 
> create table test.case665558 (c1 string, c2 string);
> -- Working query  
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);
>
> create view   test.viewcase665558 
> as
> select
>case
>   when GROUPING__ID = 255 then `c1`
>end as `col_1`,
>case
>   when GROUPING__ID = 255 then 3
>end as `col_2`,
>`c1`,
>`c2`
> from
>`test`.`case665558`
> group by
>`c1`,
>`c2`
> GROUPING SETS 
>(
>   (`c1`),
>   (`c1`, `c2`)
>);   
>
> Select * from test.viewcase665558 ;
> Error: Error while compiling statement: FAILED: SemanticException 17:1 [Error 
> 10213]: Grouping sets expression is not in GROUP BY key. Error encountered 
> near token 'c1' (state=42000,code=4)
> {noformat}
> The issue is because when the view is created, it adds the name of the table 
> to the columns. This seems to be confusing Hive:
> {noformat}
> +-+--+
> | createtab_stmt  |
> +-+--+
> | CREATE VIEW `test.viewcase665558` AS select |
> | case|
> | when GROUPING__ID = 255 then `case665558`.`c1`  |
> | end as `col_1`, |
> | case|
> | when GROUPING__ID = 255 then 3  |
> | end as `col_2`, |
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | from|
> | `test`.`case665558` |
> | group by|
> | `case665558`.`c1`,  |
> | `case665558`.`c2`   |
> | GROUPING SETS   |
> | (   |
> | (c1),   |
> | (c1, c2)|
> | )   |
> +-+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers

2020-03-18 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23034:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-23034.01.patch] committed to master. Thanks [~ShubhamChaurasia] for 
fixing it and [~thejas] for review.

> Arrow serializer should not keep the reference of arrow offset and validity 
> buffers
> ---
>
> Key: HIVE-23034
> URL: https://issues.apache.org/jira/browse/HIVE-23034
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23034.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, a part of writeList() method in arrow serializer is implemented 
> like - 
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
> int nextOffset = 0;
> for (int rowIndex = 0; rowIndex < size; rowIndex++) {
>   int selectedIndex = rowIndex;
>   if (vectorizedRowBatch.selectedInUse) {
> selectedIndex = vectorizedRowBatch.selected[rowIndex];
>   }
>   if (hiveVector.isNull[selectedIndex]) {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>   } else {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
> nextOffset += (int) hiveVector.lengths[selectedIndex];
> arrowVector.setNotNull(rowIndex);
>   }
> }
> offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
> arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and 
> offset vector. 
> Problem - 
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
> the offset and validity buffers when a threshold is crossed, updates the 
> references internally and also releases the old buffers (which decrements the 
> buffer reference count). Now the reference which we obtained in 1) becomes 
> obsolete. Furthermore if try to read or write old buffer, we see - 
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
>   at 
> io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
>   at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>  
> Solution - 
> This can be fixed by getting the buffers each time ( 
> {{arrowVector.getOffsetBuffer()}} ) we want to update them. 
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
> 0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
> thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23022) Arrow deserializer should ensure size of hive vector equal to arrow vector

2020-03-18 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23022:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-23022.01.patch] committed to master. Thanks [~ShubhamChaurasia] for 
fixing it.

> Arrow deserializer should ensure size of hive vector equal to arrow vector
> --
>
> Key: HIVE-23022
> URL: https://issues.apache.org/jira/browse/HIVE-23022
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23022.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in 
> some cases does not set the size of hive vector correctly. Size of hive 
> vector should be set at least equal to arrow vector to be able to read 
> (accommodate) it fully.
> Following exception can be seen when we try to read (using 
> {{LlapArrowRowInputFormat}} ) some table which contains complex types (struct 
> nested in array to be specific) and number of rows in table is more than 
> default (1024) batch/vector size.
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284)
>   at 
> org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23022) Arrow deserializer should ensure size of hive vector equal to arrow vector

2020-03-15 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059943#comment-17059943
 ] 

mahesh kumar behera commented on HIVE-23022:


+1

[^HIVE-23022.01.patch] looks fine to me.

> Arrow deserializer should ensure size of hive vector equal to arrow vector
> --
>
> Key: HIVE-23022
> URL: https://issues.apache.org/jira/browse/HIVE-23022
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23022.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in 
> some cases does not set the size of hive vector correctly. Size of hive 
> vector should be set at least equal to arrow vector to be able to read 
> (accommodate) it fully.
> Following exception can be seen when we try to read (using 
> {{LlapArrowRowInputFormat}} ) some table which contains complex types (struct 
> nested in array to be specific) and number of rows in table is more than 
> default (1024) batch/vector size.
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284)
>   at 
> org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22998) Dump partition info if hive.repl.dump.metadata.only.for.external.table conf is enabled

2020-03-12 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22998:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-22998.12.patch] committed to master. Thanks [~aasha] .

> Dump partition info if hive.repl.dump.metadata.only.for.external.table conf 
> is enabled 
> ---
>
> Key: HIVE-22998
> URL: https://issues.apache.org/jira/browse/HIVE-22998
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22998.01.patch, HIVE-22998.02.patch, 
> HIVE-22998.03.patch, HIVE-22998.04.patch, HIVE-22998.05.patch, 
> HIVE-22998.06.patch, HIVE-22998.07.patch, HIVE-22998.08.patch, 
> HIVE-22998.09.patch, HIVE-22998.10.patch, HIVE-22998.11.patch, 
> HIVE-22998.12.patch, HIVE-22998.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22860) Support metadata only replication for external tables

2020-02-18 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22860:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-22860.02.patch] committed to master. Thanks [~aasha].

> Support metadata only replication for external tables
> -
>
> Key: HIVE-22860
> URL: https://issues.apache.org/jira/browse/HIVE-22860
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22860.01.patch, HIVE-22860.02.patch, 
> HIVE-22860.patch, HIVE-22860.patch, HIVE-22860.patch, HIVE-22860.patch, 
> HIVE-22860.patch, HIVE-22860.patch, HIVE-22860.patch, HIVE-22860.patch, 
> HIVE-22860.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22844) Validate cm configs, add retries in fs apis for cm

2020-02-17 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22844:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-22844.02.patch] committed to master. Thanks [~aasha] for fixing it.

> Validate cm configs, add retries in fs apis for cm
> --
>
> Key: HIVE-22844
> URL: https://issues.apache.org/jira/browse/HIVE-22844
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22844.01.patch, HIVE-22844.02.patch, 
> HIVE-22844.patch, HIVE-22844.patch, HIVE-22844.patch, HIVE-22844.patch, 
> HIVE-22844.patch, HIVE-22844.patch, HIVE-22844.patch, HIVE-22844.patch, 
> HIVE-22844.patch, HIVE-22844.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> # Retry create cm root logic
>  # Rename encryptionZones to cmRootLocations to be more accurate
>  # Check cmRootEncrypted.isAbsolute() first before we go for creating anything
>  # Validate fallbackNonEncryptedCmRootDir if it's really not encrypted
>  # Refactor deleteTableData logic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22890) Repl load fails if table name contains _function

2020-02-16 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22890:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-22890.patch] committed to master, thanks [~aasha] for fixing the issue.

> Repl load fails if table name contains _function
> 
>
> Key: HIVE-22890
> URL: https://issues.apache.org/jira/browse/HIVE-22890
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22890.patch, HIVE-22890.patch, HIVE-22890.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Repl load tries to load function if table name contains _function. Similarly 
> for the below contants
> {code:java}
> public static final String FUNCTIONS_ROOT_DIR_NAME = "_functions";
> {code}
> The code just checks for contains(FUNCTIONS_ROOT_DIR_NAME). So even if any 
> table or db name contains _functions, it takes the Function Load flow and 
> fails.
>  
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: Invalid 
> pathorg.apache.hadoop.hive.ql.parse.SemanticException: Invalid path at 
> org.apache.hadoop.hive.ql.exec.repl.bootstrap.load.LoadFunction.tasks(LoadFunction.java:94)
>  ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.executeBootStrapLoad(ReplLoadTask.java:238)
>  ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask.execute(ReplLoadTask.java:110)
>  ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
> ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) 
> ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:82) 
> ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT]Caused by: 
> java.lang.NullPointerException at 
> org.apache.hadoop.hive.ql.exec.repl.bootstrap.load.LoadFunction.isFunctionAlreadyLoaded(LoadFunction.java:105)
>  ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.repl.bootstrap.load.LoadFunction.tasks(LoadFunction.java:81)
>  ~[hive-exec-3.1.0.3.1.5.1-2.jar:3.1.1000-SNAPSHOT] ... 5 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Description: 
LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.

The batch size of 0 is possible in the case when a split read by ORC reader has 
all deleted or aborted data. The VectorizedOrcAcidRowBatchReader , reads the 
data from split info and then filters the rows which are not visible to the 
read transaction. So it may happen that, none of the records satisfy the 
filter. In that case VectorizedOrcAcidRowBatchReader sends a batch size of 0. 
With 0 batch size, VectorFileSinkArrowOperator creates a batch of just metadata 
and set the value count to 0. This kind of batch should be ignore by the client 
and should wait for next batch.

  was:
LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.

The batch size of 0 is possible in the case when a split read by ORC reader has 
all deleted or aborted data. In that case VectorizedOrcAcidRowBatchReader sends 
a batch size of 0. With 0 batch size, VectorFileSinkArrowOperator creates a 
batch of just metadata and set the value count to 0. This kind of batch should 
be ignore by the client and should wait for next batch.


> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.
> The batch size of 0 is possible in the case when a split read by ORC reader 
> has all deleted or aborted data. The VectorizedOrcAcidRowBatchReader , reads 
> the data from split info and then filters the rows which are not visible to 
> the read transaction. So it may happen that, none of the records satisfy the 
> filter. In that case VectorizedOrcAcidRowBatchReader sends a batch size of 0. 
> With 0 batch size, VectorFileSinkArrowOperator creates a batch of just 
> metadata and set the value count to 0. This kind of batch should be ignore by 
> the client and should wait for next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Description: 
LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.

The batch size of 0 is possible in the case when a split read by ORC reader has 
all deleted or aborted data. In that case VectorizedOrcAcidRowBatchReader sends 
a batch size of 0. With 0 batch size, VectorFileSinkArrowOperator creates a 
batch of just metadata and set the value count to 0. This kind of batch should 
be ignore by the client and should wait for next batch.

  was:LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.


> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.
> The batch size of 0 is possible in the case when a split read by ORC reader 
> has all deleted or aborted data. In that case VectorizedOrcAcidRowBatchReader 
> sends a batch size of 0. With 0 batch size, VectorFileSinkArrowOperator 
> creates a batch of just metadata and set the value count to 0. This kind of 
> batch should be ignore by the client and should wait for next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-12 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Status: Patch Available  (was: Open)

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-12 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Attachment: HIVE-22856.02.patch

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-12 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Status: Open  (was: Patch Available)

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-11 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Status: Patch Available  (was: Open)

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-11 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Status: Open  (was: Patch Available)

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-07 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Status: Patch Available  (was: Open)

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-07 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Attachment: HIVE-22856.01.patch

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-22856.01.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22856) Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when ArrowStreamReader returns a 0 length batch.

2020-02-07 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---
Summary: Hive LLAP LlapArrowBatchRecordReader skipping remaining batches 
when ArrowStreamReader returns a 0 length batch.  (was: Hive LLAP external 
client not reading data from ArrowStreamReader fully)

> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> 
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22856) Hive LLAP external client not reading data from ArrowStreamReader fully

2020-02-07 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-22856:
--


> Hive LLAP external client not reading data from ArrowStreamReader fully
> ---
>
> Key: HIVE-22856
> URL: https://issues.apache.org/jira/browse/HIVE-22856
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22736) Support replication across multiple encryption zones

2020-02-06 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22736:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Support replication across multiple encryption zones
> 
>
> Key: HIVE-22736
> URL: https://issues.apache.org/jira/browse/HIVE-22736
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22736) Support replication across multiple encryption zones

2020-02-06 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032067#comment-17032067
 ] 

mahesh kumar behera commented on HIVE-22736:


patch [^HIVE-22736.patch] is committed to master. Thanks [~aasha].

> Support replication across multiple encryption zones
> 
>
> Key: HIVE-22736
> URL: https://issues.apache.org/jira/browse/HIVE-22736
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch, 
> HIVE-22736.patch, HIVE-22736.patch, HIVE-22736.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22736) Support replication across multiple encryption zones

2020-01-21 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22736:
---
Status: Patch Available  (was: In Progress)

> Support replication across multiple encryption zones
> 
>
> Key: HIVE-22736
> URL: https://issues.apache.org/jira/browse/HIVE-22736
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22736.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22736) Support replication across multiple encryption zones

2020-01-21 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22736:
---
Status: In Progress  (was: Patch Available)

> Support replication across multiple encryption zones
> 
>
> Key: HIVE-22736
> URL: https://issues.apache.org/jira/browse/HIVE-22736
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22736.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22736) Support replication across multiple encryption zones

2020-01-21 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22736:
---
Status: Patch Available  (was: Open)

> Support replication across multiple encryption zones
> 
>
> Key: HIVE-22736
> URL: https://issues.apache.org/jira/browse/HIVE-22736
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22736.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22736) Support replication across multiple encryption zones

2020-01-21 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22736:
---
Status: Open  (was: Patch Available)

> Support replication across multiple encryption zones
> 
>
> Key: HIVE-22736
> URL: https://issues.apache.org/jira/browse/HIVE-22736
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22736.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22736) Support replication across multiple encryption zones

2020-01-21 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-22736:
--

Assignee: mahesh kumar behera  (was: Aasha Medhi)

> Support replication across multiple encryption zones
> 
>
> Key: HIVE-22736
> URL: https://issues.apache.org/jira/browse/HIVE-22736
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22736.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >