[jira] [Resolved] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-21 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-13262.
--
Resolution: Fixed

The fix has been merged to master.

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
> Fix For: Impala 4.5.0
>
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}
>  
> +*Edit:*+
> The following is a smaller case to reproduce the issue. The correct result 
> should be 0 row but Impala returns 1 row as above.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1 and t2.start_date=t2.end_date;
> {code}
>  
> Recall the contents of the inline view '{*}t2{*}' above is as follows.
> {code:java}
> +-+---+-+-++
> | dept_no | dept_rank | start_date  | end_date| rn |
> +-+---+-+-++
> | 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
> | 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
> | 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
> +-+---+-+-++
> {code}
>  
> On the other hand, the following query without the conjunct '{*}rn=1{*}' 
> returns the correct result, which is the row with '{*}rn{*}' equal to *3* 
> above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the 
> incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of 
> the table '{*}department{*}'.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where t2.start_date=t2.end_date;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For a

[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-21 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13262:
-
Fix Version/s: Impala 4.5.0

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
> Fix For: Impala 4.5.0
>
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}
>  
> +*Edit:*+
> The following is a smaller case to reproduce the issue. The correct result 
> should be 0 row but Impala returns 1 row as above.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1 and t2.start_date=t2.end_date;
> {code}
>  
> Recall the contents of the inline view '{*}t2{*}' above is as follows.
> {code:java}
> +-+---+-+-++
> | dept_no | dept_rank | start_date  | end_date| rn |
> +-+---+-+-++
> | 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
> | 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
> | 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
> +-+---+-+-++
> {code}
>  
> On the other hand, the following query without the conjunct '{*}rn=1{*}' 
> returns the correct result, which is the row with '{*}rn{*}' equal to *3* 
> above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the 
> incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of 
> the table '{*}department{*}'.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where t2.start_date=t2.end_date;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-19 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875012#comment-17875012
 ] 

Fang-Yu Rao edited comment on IMPALA-13262 at 8/19/24 11:01 PM:


Thanks [~MikaelSmith]!

 
{quote}Wouldn't that limit the ScanNode to return only the 3rd row?
{quote}
This is correct. If we push the predicate '{*}start_date = end_date{*}' to the 
scan node of the table '{*}department{*}' we will get the 3rd row as shown 
below.
{code:java}
+-+---+-+-+
| dept_no | dept_rank | start_date  | end_date|
+-+---+-+-+
| 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 |
+-+---+-+-+
{code}
Later on when we apply the analytic function '{*}row_number(){*}' to this 
returned row, this row above would be associated with the row number of 1, thus 
satisfying the (analytic) conjunct of '{*}rn = 1{*}' and we would get this row 
as a result.

 

However, if we do not push the predicate '{*}start_date = end_date{*}' to the 
scan node of the table '{*}department{*}', we will get all 3 rows, on which we 
will apply the analytic function '{*}row_number(){*}'. This time the row that 
is associated with the row number of 1 is different. And this row of row number 
1 does not satisfy '{*}start_date = end_date{*}' so no row would be returned.
{code:java}
+-+---+-+-++
| dept_no | dept_rank | start_date  | end_date| rn |
+-+---+-+-++
| 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
| 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
| 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
+-+---+-+-++
{code}


was (Author: fangyurao):
Thanks [~MikaelSmith]!

 
{quote}Wouldn't that limit the ScanNode to return only the 3rd row?
{quote}
This is correct. If we push the predicate '{*}start_date = end_date{*}' to the 
scan node of the table '{*}department{*}' we will get the 3rd row as shown 
below.
{code:java}
+-+---+-+-+
| dept_no | dept_rank | start_date  | end_date|
+-+---+-+-+
| 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 |
+-+---+-+-+
{code}
Later on when we apply the analytic function '{*}row_number(){*}' to this 
returned row, this row above would be associated with the row number of 1, thus 
satisfying the (analytic) conjunct of '{*}rn = 1{*}' and we would get this row 
as a result.

 

However, if we do not push the predicate '{*}start_date = end_date{*}' to the 
scan node of the table '{*}department{*}', we will get all 3 rows, on which we 
will apply the analytic function '{*}row_number(){*}'. This time the row that 
is associated with the row number of 1 is different. And this row of row number 
1 does not satisfy '{*}start_date = end_date{*}'.
{code:java}
+-+---+-+-++
| dept_no | dept_rank | start_date  | end_date| rn |
+-+---+-+-++
| 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
| 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
| 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
+-+---+-+-++
{code}

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (e

[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-19 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875012#comment-17875012
 ] 

Fang-Yu Rao commented on IMPALA-13262:
--

Thanks [~MikaelSmith]!

 
{quote}Wouldn't that limit the ScanNode to return only the 3rd row?
{quote}
This is correct. If we push the predicate '{*}start_date = end_date{*}' to the 
scan node of the table '{*}department{*}' we will get the 3rd row as shown 
below.
{code:java}
+-+---+-+-+
| dept_no | dept_rank | start_date  | end_date|
+-+---+-+-+
| 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 |
+-+---+-+-+
{code}
Later on when we apply the analytic function '{*}row_number(){*}' to this 
returned row, this row above would be associated with the row number of 1, thus 
satisfying the (analytic) conjunct of '{*}rn = 1{*}' and we would get this row 
as a result.

 

However, if we do not push the predicate '{*}start_date = end_date{*}' to the 
scan node of the table '{*}department{*}', we will get all 3 rows, on which we 
will apply the analytic function '{*}row_number(){*}'. This time the row that 
is associated with the row number of 1 is different. And this row of row number 
1 does not satisfy '{*}start_date = end_date{*}'.
{code:java}
+-+---+-+-++
| dept_no | dept_rank | start_date  | end_date| rn |
+-+---+-+-++
| 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
| 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
| 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
+-+---+-+-++
{code}

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}
>  
> +*Edit:*+
> The follo

[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-11 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872740#comment-17872740
 ] 

Fang-Yu Rao commented on IMPALA-13262:
--

I seemed to find a workaround which requires rewriting the conjunct 
'{*}start_date=end_date{*}'. In short we could try rewriting this conjunct as 
'{*}not start_date > end_date and not start_date < end_date{*}' assuming the 
values in these 2 columns are all non-nulls.

I verified on 
[IMPALA-13252|https://github.com/apache/impala/commit/5b7ed40d52bb63a5dda0f12f83370b0fbcaaca26]
 that after the query rewriting, we won't have that unwanted conjunct pushed 
down to the scan node of the table '{*}department{*}'.
{code:sql}
Query: select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where not t2.start_date > t2.end_date
and not t2.start_date < t2.end_date
Query submitted at: 2024-08-11 15:00:52 (Coordinator: 
http://fangyu-upstream-dev.gce.cloudera.com:25000)
Query state can be monitored at: 
http://fangyu-upstream-dev.gce.cloudera.com:25000/query_plan?query_id=aa4fb0f36398286a:05a46633
Fetched 0 row(s) in 0.13s
{code}

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}
>  
> +*Edit:*+
> The following is a smaller case to reproduce the issue. The correct result 
> should be 0 row but Impala returns 1 row as above.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1 and t2.start_date=t2.end_date;
> {code}
>  
> Recall the contents of the inline view '{*}t2{*}' above is as follows.
> {code:java}
> +-+---+-+-++
> | dept_no | dept_rank | start_date  | end_date| rn |
> +-+---+-+-++
> | 

[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-11 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872721#comment-17872721
 ] 

Fang-Yu Rao commented on IMPALA-13262:
--

Attaching a debugger to a running Impala frontend (on 
[IMPALA-13252|https://github.com/apache/impala/commit/5b7ed40d52bb63a5dda0f12f83370b0fbcaaca26])
 using the smaller test case to reproduce the issue, I found the place where we 
added the conjunct '{*}start_date=end_date{*}' which in turn was pushed down to 
the HDFS scan node of the table '{*}department{*}' within the inline view.
 # Within 
[SingleNodePlanner#createInlineViewPlan()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1198],
 we call [migrateConjunctsToInlineView(analyzer, 
inlineViewRef)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1208].
 # Within 
[SingleNodePlanner#migrateConjunctsToInlineView()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1374],
 we call [migrateOrCopyConjunctsToInlineView(analyzer, inlineViewRef, tids, 
analyticPreds, 
unassignedConjuncts)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1391]
 when there is an analytical predicate ('{*}rn=1{*}') to migrate into the 
inline view.
 # Within 
[SingleNodePlanner#migrateOrCopyConjunctsToInlineView()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1411],
 we called [addConjunctsIntoInlineView(analyzer, inlineViewRef, 
evalInInlineViewPreds)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1431].
 It is within this call, we added '{*}start_date=end_date{*}' to the bound 
predicates for the underlying table '{*}department{*}' so that this predicate 
was applied to the scan node.

 
More specifically, in 
[SingleNodePlanner#addConjunctsIntoInlineView()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1476],
 when we called [analyzer.createEquivConjuncts(inlineViewRef.getId(), 
preds)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1481],
 an additional predicate '{*}start_date=end_date{*}' would be added to the last 
input argument '{*}preds{*}'.

 

Later on in the same method (SingleNodePlanner#addConjunctsIntoInlineView()), 
[inlineViewRef.getAnalyzer().registerConjuncts(viewPredicates)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1529]
 registered the conjunct '{*}start_date=end_date{*}' so that 
'{*}analyzer.getBoundPredicates(new TupleId(0)){*}' contains 
'{*}start_date=end_date{*}' that would later be used as a conjunct to be 
applied when the HDFS scan node for table '{*}department{*}' was created.

 

For easy reference, a smaller test case to reproduce the issue is given in the 
following, which does not involve a join.
{code:sql}
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1 and t2.start_date=t2.end_date;
{code}
 

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_l

[jira] [Resolved] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-10 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-13276.
--
Resolution: Fixed

> Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
> --
>
> Key: IMPALA-13276
> URL: https://issues.apache.org/jira/browse/IMPALA-13276
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
> [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
>  as provided in the following describes the meaning of this query option.
> {code:java}
>   The RUNTIME_FILTER_WAIT_TIME_MS query option
>   adjusts the settings for the runtime filtering feature.
>   It specifies a time in milliseconds that each scan node waits for
>   runtime filters to be produced by other plan fragments.
> {code}
>  
> However the description above is not entirely accurate in that the wait time 
> is with respect to the time when a runtime filter was registered (within 
> [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
>  instead of the time when a scan node is calling 
> [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
>  For instance if a scan node started so late that when 
> ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
> the registration of this runtime filter was already greater than the value of 
> 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
> runtime filter. Refer to 
> [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
>  for further details.
> {code:java}
> bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
>   unique_lock l(arrival_mutex_);
>   while (arrival_time_.Load() == 0) {
> int64_t ms_since_registration = MonotonicMillis() - registration_time_;
> int64_t ms_remaining = timeout_ms - ms_since_registration;
> if (ms_remaining <= 0) break;
> if (injection_delay_ > 0) SleepForMs(injection_delay_);
> arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
>   }
>   return arrival_time_.Load() != 0;
> }
> {code}
> We should revise the documentation to make it a bit clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-10 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13276:
-
Target Version: Impala 4.4.1

> Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
> --
>
> Key: IMPALA-13276
> URL: https://issues.apache.org/jira/browse/IMPALA-13276
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
> [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
>  as provided in the following describes the meaning of this query option.
> {code:java}
>   The RUNTIME_FILTER_WAIT_TIME_MS query option
>   adjusts the settings for the runtime filtering feature.
>   It specifies a time in milliseconds that each scan node waits for
>   runtime filters to be produced by other plan fragments.
> {code}
>  
> However the description above is not entirely accurate in that the wait time 
> is with respect to the time when a runtime filter was registered (within 
> [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
>  instead of the time when a scan node is calling 
> [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
>  For instance if a scan node started so late that when 
> ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
> the registration of this runtime filter was already greater than the value of 
> 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
> runtime filter. Refer to 
> [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
>  for further details.
> {code:java}
> bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
>   unique_lock l(arrival_mutex_);
>   while (arrival_time_.Load() == 0) {
> int64_t ms_since_registration = MonotonicMillis() - registration_time_;
> int64_t ms_remaining = timeout_ms - ms_since_registration;
> if (ms_remaining <= 0) break;
> if (injection_delay_ > 0) SleepForMs(injection_delay_);
> arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
>   }
>   return arrival_time_.Load() != 0;
> }
> {code}
> We should revise the documentation to make it a bit clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-10 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13276:
-
Labels: 4.4.1  (was: )

> Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
> --
>
> Key: IMPALA-13276
> URL: https://issues.apache.org/jira/browse/IMPALA-13276
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: 4.4.1
>
> The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
> [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
>  as provided in the following describes the meaning of this query option.
> {code:java}
>   The RUNTIME_FILTER_WAIT_TIME_MS query option
>   adjusts the settings for the runtime filtering feature.
>   It specifies a time in milliseconds that each scan node waits for
>   runtime filters to be produced by other plan fragments.
> {code}
>  
> However the description above is not entirely accurate in that the wait time 
> is with respect to the time when a runtime filter was registered (within 
> [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
>  instead of the time when a scan node is calling 
> [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
>  For instance if a scan node started so late that when 
> ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
> the registration of this runtime filter was already greater than the value of 
> 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
> runtime filter. Refer to 
> [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
>  for further details.
> {code:java}
> bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
>   unique_lock l(arrival_mutex_);
>   while (arrival_time_.Load() == 0) {
> int64_t ms_since_registration = MonotonicMillis() - registration_time_;
> int64_t ms_remaining = timeout_ms - ms_since_registration;
> if (ms_remaining <= 0) break;
> if (injection_delay_ > 0) SleepForMs(injection_delay_);
> arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
>   }
>   return arrival_time_.Load() != 0;
> }
> {code}
> We should revise the documentation to make it a bit clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-10 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13276:
-
Labels:   (was: 4.4.1)

> Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
> --
>
> Key: IMPALA-13276
> URL: https://issues.apache.org/jira/browse/IMPALA-13276
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
> [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
>  as provided in the following describes the meaning of this query option.
> {code:java}
>   The RUNTIME_FILTER_WAIT_TIME_MS query option
>   adjusts the settings for the runtime filtering feature.
>   It specifies a time in milliseconds that each scan node waits for
>   runtime filters to be produced by other plan fragments.
> {code}
>  
> However the description above is not entirely accurate in that the wait time 
> is with respect to the time when a runtime filter was registered (within 
> [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
>  instead of the time when a scan node is calling 
> [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
>  For instance if a scan node started so late that when 
> ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
> the registration of this runtime filter was already greater than the value of 
> 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
> runtime filter. Refer to 
> [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
>  for further details.
> {code:java}
> bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
>   unique_lock l(arrival_mutex_);
>   while (arrival_time_.Load() == 0) {
> int64_t ms_since_registration = MonotonicMillis() - registration_time_;
> int64_t ms_remaining = timeout_ms - ms_since_registration;
> if (ms_remaining <= 0) break;
> if (injection_delay_ > 0) SleepForMs(injection_delay_);
> arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
>   }
>   return arrival_time_.Load() != 0;
> }
> {code}
> We should revise the documentation to make it a bit clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-10 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13276:
-
Epic Color: ghx-label-10  (was: ghx-label-11)

> Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
> --
>
> Key: IMPALA-13276
> URL: https://issues.apache.org/jira/browse/IMPALA-13276
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
> [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
>  as provided in the following describes the meaning of this query option.
> {code:java}
>   The RUNTIME_FILTER_WAIT_TIME_MS query option
>   adjusts the settings for the runtime filtering feature.
>   It specifies a time in milliseconds that each scan node waits for
>   runtime filters to be produced by other plan fragments.
> {code}
>  
> However the description above is not entirely accurate in that the wait time 
> is with respect to the time when a runtime filter was registered (within 
> [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
>  instead of the time when a scan node is calling 
> [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
>  For instance if a scan node started so late that when 
> ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
> the registration of this runtime filter was already greater than the value of 
> 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
> runtime filter. Refer to 
> [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
>  for further details.
> {code:java}
> bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
>   unique_lock l(arrival_mutex_);
>   while (arrival_time_.Load() == 0) {
> int64_t ms_since_registration = MonotonicMillis() - registration_time_;
> int64_t ms_remaining = timeout_ms - ms_since_registration;
> if (ms_remaining <= 0) break;
> if (injection_delay_ > 0) SleepForMs(injection_delay_);
> arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
>   }
>   return arrival_time_.Load() != 0;
> }
> {code}
> We should revise the documentation to make it a bit clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-09 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13262:
-
Description: 
We found that in some scenario Apache Impala 
([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
predicates to scan nodes, which in turn produces the wrong result. The 
following is a concrete example to reproduce the issue.
{code:sql}
create database impala_13262;
use impala_13262;

create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following query should return 0 row. However Apache Impala produces one 
row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date;

set explain_level=2;

// In the output of the EXPLAIN statement, we found that the predicate 
"start_data = end_date" was pushed
// down to the scan node, which is wrong.

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}
 

+*Edit:*+

The following is a smaller case to reproduce the issue. The correct result 
should be 0 row but Impala returns 1 row as above.
{code:java}
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1 and t2.start_date=t2.end_date;
{code}
 

Recall the contents of the inline view '{*}t2{*}' above is as follows.
{code:java}
+-+---+-+-++
| dept_no | dept_rank | start_date  | end_date| rn |
+-+---+-+-++
| 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
| 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
| 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
+-+---+-+-++
{code}
 

On the other hand, the following query without the conjunct '{*}rn=1{*}' 
returns the correct result, which is the row with '{*}rn{*}' equal to *3* 
above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the 
incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of the 
table '{*}department{*}'.
{code:java}
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where t2.start_date=t2.end_date;
{code}

  was:
We found that in some scenario Apache Impala 
(https://github.com/apache/impala/commit/c539874) could incorrectly push 
predicates to scan nodes, which in turn produces the wrong result. The 
following is a concrete example to reproduce the issue.
{code:sql}
create database impala_13262;
use impala_13262;

create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following query should return 0 row. However Apache Impala produces one 
row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_d

[jira] [Comment Edited] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-09 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872477#comment-17872477
 ] 

Fang-Yu Rao edited comment on IMPALA-13262 at 8/9/24 10:15 PM:
---

I started git bisecting from [IMPALA-9132: Explain statements should not cause 
nullptr in 
LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which 
is not affected by the bug) and it told us that the culprit is IMPALA-9979: 
part 2.

In addition, setting '{*}ANALYTIC_RANK_PUSHDOWN_THRESHOLD{*}' to *0* could not 
work around this issue.
{code:java}
fangyurao@fangyu:~/Impala_for_FE$ git bisect bad

 
b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit

 
commit b42c64993d46893488a667fb9c425548fdf964ab 

 
Author: Tim Armstrong  

 
Date:   Tue Feb 2 14:02:12 2021 -0800   

 


 
IMPALA-9979: part 2: partitioned top-n
{code}


was (Author: fangyurao):
I started git bisecting from [IMPALA-9132: Explain statements should not cause 
nullptr in 
LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which 
is not affected by the bug) and it told us that the culprit is IMPALA-9979: 
part 2.
{code:java}
fangyurao@fangyu:~/Impala_for_FE$ git bisect bad

 
b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit

 
commit b42c64993d46893488a667fb9c425548fdf964ab 

 
Author: Tim Armstrong  

 
Date:   Tue Feb 2 14:02:12 2021 -0800   

 


 
IMPALA-9979: part 2: partitioned top-n
{code}

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> (https://github.com/apache/impala/commit/c539874) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]       

[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-09 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872477#comment-17872477
 ] 

Fang-Yu Rao commented on IMPALA-13262:
--

I started git bisecting from [IMPALA-9132: Explain statements should not cause 
nullptr in 
LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which 
is not affected by the bug) and it told us that the culprit is IMPALA-9979: 
part 2.
{code:java}
fangyurao@fangyu:~/Impala_for_FE$ git bisect bad

 
b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit

 
commit b42c64993d46893488a667fb9c425548fdf964ab 

 
Author: Tim Armstrong  

 
Date:   Tue Feb 2 14:02:12 2021 -0800   

 


 
IMPALA-9979: part 2: partitioned top-n
{code}

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> (https://github.com/apache/impala/commit/c539874) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13250) Document ENABLED_RUNTIME_FILTER_TYPES query option

2024-08-08 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-13250.
--
Resolution: Fixed

The documentation has been added.

> Document ENABLED_RUNTIME_FILTER_TYPES query option
> --
>
> Key: IMPALA-13250
> URL: https://issues.apache.org/jira/browse/IMPALA-13250
> Project: IMPALA
>  Issue Type: Documentation
>Affects Versions: Impala 4.0.0
>Reporter: Michael Smith
>Assignee: Fang-Yu Rao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13276:
-
Issue Type: Documentation  (was: Task)

> Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
> --
>
> Key: IMPALA-13276
> URL: https://issues.apache.org/jira/browse/IMPALA-13276
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
> [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
>  as provided in the following describes the meaning of this query option.
> {code:java}
>   The RUNTIME_FILTER_WAIT_TIME_MS query option
>   adjusts the settings for the runtime filtering feature.
>   It specifies a time in milliseconds that each scan node waits for
>   runtime filters to be produced by other plan fragments.
> {code}
>  
> However the description above is not entirely accurate in that the wait time 
> is with respect to the time when a runtime filter was registered (within 
> [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
>  instead of the time when a scan node is calling 
> [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
>  For instance if a scan node started so late that when 
> ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
> the registration of this runtime filter was already greater than the value of 
> 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
> runtime filter. Refer to 
> [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
>  for further details.
> {code:java}
> bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
>   unique_lock l(arrival_mutex_);
>   while (arrival_time_.Load() == 0) {
> int64_t ms_since_registration = MonotonicMillis() - registration_time_;
> int64_t ms_remaining = timeout_ms - ms_since_registration;
> if (ms_remaining <= 0) break;
> if (injection_delay_ > 0) SleepForMs(injection_delay_);
> arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
>   }
>   return arrival_time_.Load() != 0;
> }
> {code}
> We should revise the documentation to make it a bit clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13276:
-
Summary: Revise the documentation of query option 
'RUNTIME_FILTER_WAIT_TIME_MS'  (was: Revise the description of the query option 
of 'RUNTIME_FILTER_WAIT_TIME_MS')

> Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
> --
>
> Key: IMPALA-13276
> URL: https://issues.apache.org/jira/browse/IMPALA-13276
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
> [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
>  as provided in the following describes the meaning of this query option.
> {code:java}
>   The RUNTIME_FILTER_WAIT_TIME_MS query option
>   adjusts the settings for the runtime filtering feature.
>   It specifies a time in milliseconds that each scan node waits for
>   runtime filters to be produced by other plan fragments.
> {code}
>  
> However the description above is not entirely accurate in that the wait time 
> is with respect to the time when a runtime filter was registered (within 
> [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
>  instead of the time when a scan node is calling 
> [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
>  For instance if a scan node started so late that when 
> ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
> the registration of this runtime filter was already greater than the value of 
> 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
> runtime filter. Refer to 
> [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
>  for further details.
> {code:java}
> bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
>   unique_lock l(arrival_mutex_);
>   while (arrival_time_.Load() == 0) {
> int64_t ms_since_registration = MonotonicMillis() - registration_time_;
> int64_t ms_remaining = timeout_ms - ms_since_registration;
> if (ms_remaining <= 0) break;
> if (injection_delay_ > 0) SleepForMs(injection_delay_);
> arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
>   }
>   return arrival_time_.Load() != 0;
> }
> {code}
> We should revise the documentation to make it a bit clearer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13276) Revise the description of the query option of 'RUNTIME_FILTER_WAIT_TIME_MS'

2024-08-05 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13276:


 Summary: Revise the description of the query option of 
'RUNTIME_FILTER_WAIT_TIME_MS'
 Key: IMPALA-13276
 URL: https://issues.apache.org/jira/browse/IMPALA-13276
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at 
[https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43]
 as provided in the following describes the meaning of this query option.
{code:java}
  The RUNTIME_FILTER_WAIT_TIME_MS query option
  adjusts the settings for the runtime filtering feature.
  It specifies a time in milliseconds that each scan node waits for
  runtime filters to be produced by other plan fragments.
{code}
 

However the description above is not entirely accurate in that the wait time is 
with respect to the time when a runtime filter was registered (within 
[QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381])
 instead of the time when a scan node is calling 
[ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212].
 For instance if a scan node started so late that when 
ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since 
the registration of this runtime filter was already greater than the value of 
'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the 
runtime filter. Refer to 
[https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87]
 for further details.
{code:java}
bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const {
  unique_lock l(arrival_mutex_);
  while (arrival_time_.Load() == 0) {
int64_t ms_since_registration = MonotonicMillis() - registration_time_;
int64_t ms_remaining = timeout_ms - ms_since_registration;
if (ms_remaining <= 0) break;
if (injection_delay_ > 0) SleepForMs(injection_delay_);
arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI);
  }
  return arrival_time_.Load() != 0;
}
{code}

We should revise the documentation to make it a bit clearer.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-07-31 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13262:
-
Labels: correctness  (was: )

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> (https://github.com/apache/impala/commit/c539874) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-07-30 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13262:
-
Description: 
We found that in some scenario Apache Impala 
(https://github.com/apache/impala/commit/c539874) could incorrectly push 
predicates to scan nodes, which in turn produces the wrong result. The 
following is a concrete example to reproduce the issue.
{code:sql}
create database impala_13262;
use impala_13262;

create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following query should return 0 row. However Apache Impala produces one 
row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date;

set explain_level=2;

// In the output of the EXPLAIN statement, we found that the predicate 
"start_data = end_date" was pushed
// down to the scan node, which is wrong.

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}

  was:
We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:sql}
create database impala_13262;
use impala_13262;

create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following query should return 0 row. However Apache Impala produces one 
row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date;

set explain_level=2;

// In the output of the EXPLAIN statement, we found that the predicate 
"start_data = end_date" was pushed
// down to the scan node, which is wrong.

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}


> Predicate pus

[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-07-30 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13262:
-
Description: 
We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:sql}
create database impala_13262;
use impala_13262;

create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following query should return 0 row. However Apache Impala produces one 
row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date;

set explain_level=2;

// In the output of the EXPLAIN statement, we found that the predicate 
"start_data = end_date" was pushed
// down to the scan node, which is wrong.

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}

  was:
We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:sql}
create database impala_13262;
use impala_13262;

create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following should return 0 row. However Apache Impala produces one row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date

set explain_level=2;

// In the output of the EXPLAIN statement, we found that the predicate 
"start_data = end_date" was pushed
// down to the scan node, which is wrong.

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}


> Predicate pushdown causes incorrect results in join condition
> 

[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-07-30 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13262:
-
Description: 
We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:sql}
create database impala_13262;
use impala_13262;

create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following should return 0 row. However Apache Impala produces one row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date

set explain_level=2;

// In the output of the EXPLAIN statement, we found that the predicate 
"start_data = end_date" was pushed
// down to the scan node, which is wrong.

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}

  was:
We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:java}
create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following should return 0 row. However Apache Impala produces one row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date

//

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}


> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>

[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-07-30 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13262:
-
Description: 
We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:java}
create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following should return 0 row. However Apache Impala produces one row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date

//

| 01:SCAN HDFS [impala_13262.department, RANDOM]                                
                        |
|    HDFS partitions=1/1 files=3 size=132B                                      
                        |
|    predicates: start_date = end_date                                          
                        |
|    stored statistics:                                                         
                        |
|      table: rows=unavailable size=unavailable                                 
                        |
|      columns: unavailable                                                     
                        |
|    extrapolated-rows=disabled max-scan-range-rows=unavailable                 
                        |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
                        |
|    tuple-ids=1 row-size=40B cardinality=1                                     
                        |
|    in pipelines: 01(GETNEXT)                                                  
                        |
+---+
{code}

  was:
We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:java}
create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following should return 0 row. However Apache Impala produces one row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date


{code}


> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> We found that in some scenario Apache Impala could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:java}
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following should return 0 row. However Apache Impala produces one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date
> //
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>  

[jira] [Created] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-07-30 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13262:


 Summary: Predicate pushdown causes incorrect results in join 
condition
 Key: IMPALA-13262
 URL: https://issues.apache.org/jira/browse/IMPALA-13262
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We found that in some scenario Apache Impala could incorrectly push predicates 
to scan nodes, which in turn produces the wrong result. The following is a 
concrete example to reproduce the issue.
{code:java}
create table department ( dept_no integer, dept_rank integer, start_date 
timestamp,end_date timestamp);

insert into department values(1,1,'2024-01-01','2024-01-02');
insert into department values(1,2,'2024-01-02','2024-01-03');
insert into department values(1,3,'2024-01-03','2024-01-03');

create table employee (employee_no integer, depart_no integer);

insert into employee values (1,1);

// The following should return 0 row. However Apache Impala produces one row.

select * from employee t1
inner join (
select * from
(
select dept_no,dept_rank,start_date,end_date
,row_number() over(partition by dept_no order by dept_rank) rn
from department
) t2
where rn=1
) t2
on t1.depart_no=t2.dept_no
where t2.start_date=t2.end_date


{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13169) Specify cluster id before starting HiveServer2 after HIVE-28324

2024-06-19 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13169:
-
Description: 
After HIVE-28324, in order to start HiveServer2, it is required that the 
cluster id has to be passed to HiveServer2, either via the environment 
variable, or the command line Java property. We should provide HiveServer2 with 
the cluster id before we bump up CDP_BUILD_NUMBER to have a CDP Hive dependency 
that includes this Hive change.


  was:
After HIVE-28324, in order to start HiveServer2, it is required that the 
cluster id has to be passed to HiveServer2, either via the environment 
variable, or the command line Java property. We should provide HiveServer2 with 
the cluster id before we bump up  CDP_BUILD_NUMBER that includes this Hive 
change.



> Specify cluster id before starting HiveServer2 after HIVE-28324
> ---
>
> Key: IMPALA-13169
> URL: https://issues.apache.org/jira/browse/IMPALA-13169
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> After HIVE-28324, in order to start HiveServer2, it is required that the 
> cluster id has to be passed to HiveServer2, either via the environment 
> variable, or the command line Java property. We should provide HiveServer2 
> with the cluster id before we bump up CDP_BUILD_NUMBER to have a CDP Hive 
> dependency that includes this Hive change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13169) Specify cluster id before starting HiveServer2 after HIVE-28324

2024-06-19 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13169:


 Summary: Specify cluster id before starting HiveServer2 after 
HIVE-28324
 Key: IMPALA-13169
 URL: https://issues.apache.org/jira/browse/IMPALA-13169
 Project: IMPALA
  Issue Type: Task
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


After HIVE-28324, in order to start HiveServer2, it is required that the 
cluster id has to be passed to HiveServer2, either via the environment 
variable, or the command line Java property. We should provide HiveServer2 with 
the cluster id before we bump up  CDP_BUILD_NUMBER that includes this Hive 
change.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13167) Impala's coordinator could not be connected after a restart in custom cluster test in the ASAN build on ARM

2024-06-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13167:
-
Description: 
In an internal Jenkins run, we found that it's possible that Impala's 
coordinator could not be connected after a restart that occurred after the 
coordinator hit a DCHECK during the custom cluster test in the ASAN build on 
ARM.

Specifically, in that Jenkins run, we found that Impala's coordinator hit the 
DCHECK in [RuntimeProfile::EventSequence::Start(int64_t 
start_time_ns)|https://github.com/apache/impala/blob/master/be/src/util/runtime-profile-counters.h#L656]
 while running a query in 
[ranger_column_masking_complex_types.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test#L724-L732]
 that was run by 
[test_column_masking()|https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py#L1916].
 This is a known issue as described in IMPALA-4631.

Since Impala daemons and the catalog server are restarted for each test in 
test_ranger.py, the next test run after test_column_masking() should most 
likely be passed. However it did not seem like this. We found that for the 
following few tests (e.g., test_block_metadata_update()) in test_ranger.py, 
Impala's pytest framework was not able to connect to the coordinator with the 
following error and hence those tests failed.
{code:java}
-- 2024-06-18 08:49:43,350 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad_args=--server-name=server1 --ranger_service_type=hive 
--ranger_app_id=impala --authorization_provider=ranger ' 
'--state_store_args=None ' '--catalogd_args=--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger ' --impalad_args=--default_query_options=
08:49:43 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
08:49:43 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO
08:49:43 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
08:49:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO
08:49:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
08:49:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:47 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
08:49:47 MainThread: Debug webpage not yet available: 
HTTPConnectionPool(host='impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com',
 port=25000): Max retries exceeded with url: /backends?json (Caused by 
NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection 
refused',))
08:49:49 MainThread: Debug webpage did not become available in expected time.
08:49:49 MainThread: Waiting for num_known_live_backends=3. Current value: None
08:49:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:50 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
08:49:50 MainThread: Waiting for num_known_live_backends=3. Current value: 0
08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:51 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
08:49:51 MainThread: num_known_live_backends has reached value: 3
08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:51 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001
08:49:51 MainThread: num_known_live_backends has reached value: 3
08:49:52 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:52 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25002

[jira] [Created] (IMPALA-13167) Impala's coordinator could not be connected after a restart in custom cluster test in the ASAN build on ARM

2024-06-18 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13167:


 Summary: Impala's coordinator could not be connected after a 
restart in custom cluster test in the ASAN build on ARM
 Key: IMPALA-13167
 URL: https://issues.apache.org/jira/browse/IMPALA-13167
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


In an internal Jenkins run, we found that it's possible that Impala's 
coordinator could not be connected after a restart that occurred after the 
coordinator hit a DCHECK during the custom cluster test in the ASAN build on 
ARM.

Specifically, in that Jenkins run, we found that Impala's coordinator hit the 
DCHECK in [RuntimeProfile::EventSequence::Start(int64_t 
start_time_ns)|https://github.com/apache/impala/blob/master/be/src/util/runtime-profile-counters.h#L656]
 while running a query in 
[ranger_column_masking_complex_types.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test#L724-L732]
 that was run by 
[test_column_masking()|https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py#L1916].

Since Impala daemons and the catalog server are restarted for each test in 
test_ranger.py, the next test run after test_column_masking() should most 
likely be passed. However it did not seem like this. We found that for the 
following few tests (e.g., test_block_metadata_update()) in test_ranger.py, 
Impala's pytest framework was not able to connect to the coordinator with the 
following error and hence those tests failed.
{code:java}
-- 2024-06-18 08:49:43,350 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad_args=--server-name=server1 --ranger_service_type=hive 
--ranger_app_id=impala --authorization_provider=ranger ' 
'--state_store_args=None ' '--catalogd_args=--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger ' --impalad_args=--default_query_options=
08:49:43 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
08:49:43 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO
08:49:43 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
08:49:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO
08:49:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
08:49:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:47 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
08:49:47 MainThread: Debug webpage not yet available: 
HTTPConnectionPool(host='impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com',
 port=25000): Max retries exceeded with url: /backends?json (Caused by 
NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection 
refused',))
08:49:49 MainThread: Debug webpage did not become available in expected time.
08:49:49 MainThread: Waiting for num_known_live_backends=3. Current value: None
08:49:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:50 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
08:49:50 MainThread: Waiting for num_known_live_backends=3. Current value: 0
08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:51 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
08:49:51 MainThread: num_known_live_backends has reached value: 3
08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:49:51 MainThread: Getting num_known_live_backends from 
impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001
08:49:51 MainThread: num_known_live_backends has reached value: 3
08:49:52 MainThread: Found 3 impa

[jira] [Updated] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build

2024-06-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13165:
-
Description: 
We found from an internal build that Impala daemon crashed with a lot of 
OMException in an Ozone build.

For instance, the backend test 
[Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070]
 failed with the following stack trace collected from the generated minidump 
which is also provided in  
[^generate_junitxml.finalize.minidumps.20240616_21_41_14.xml].
{code}
Thread 502 (crashed)
 0  libc.so.6 + 0x36387
rax = 0x   rdx = 0x0006
rcx = 0x   rbx = 0x0607d920
rsi = 0x0cfa   rdi = 0x28ec
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0428
 r8 = 0xr9 = 0x7fd6662f02e0
r10 = 0x0008   r11 = 0x0202
r12 = 0x0607d920   r13 = 0x0607d980
r14 = 0x0152   r15 = 0x0223
rip = 0x7fd77dbd1387
Found by: given as instruction pointer in context
 1  libc.so.6 + 0x37a78
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0430
rip = 0x7fd77dbd2a78
Found by: stack scanning
 2  buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, 
siginfo_t*, void*) + 0x1a0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04b8
rip = 0x03a29e40
Found by: stack scanning
 3  buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, 
int, void* (*)(unsigned long)) + 0x68
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04f0
rip = 0x03b6f858
Found by: stack scanning
 4  buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0500
rip = 0x03d07f20
Found by: stack scanning
 5  buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) [clone .part.0] + 0xad0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0558
rip = 0x039faa00
Found by: stack scanning
 6  buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0560
rip = 0x00f00e4f
Found by: stack scanning
 7  libstdc++.so.6 + 0x13aa48
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0570
rip = 0x7fd78132ea48
Found by: stack scanning
 8  libstdc++.so.6 + 0x13aa48
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0580
rip = 0x7fd78132ea48
Found by: stack scanning
 9  libstdc++.so.6 + 0x11f8e2
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05b0
rip = 0x7fd7813138e2
Found by: stack scanning
10  
buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*)
 + 0x110
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05e0
rip = 0x039f6460
Found by: stack scanning
11  buffer-pool-test!google::LogMessage::Fail() + 0xd 
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0610
rip = 0x039ef6bd
Found by: stack scanning
12  buffer-pool-test!google::LogMessage::SendToLog() + 0x244
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0620
rip = 0x039f15f4
Found by: stack scanning
13  libstdc++.so.6 + 0x12cae4
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0640
rip = 0x7fd781320ae4
Found by: stack scanning
14  buffer-pool-test!_fini + 0x19b3
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0648
rip = 0x03d0cb03
Found by: stack scanning
15  buffer-pool-test!_fini + 0xa7c14
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0658
rip = 0x03db2d64
Found by: stack scanning
16  buffer-pool-test!google::LogMessage::Flush() + 0x1ec
rsp = 0x7fd6662f06f0   rip = 0x039ef09c
Found by: stack scanning
17  libstdc++.so.6 + 0x12cae4
rsp = 0x7fd6662f0730   rip = 0x7fd781320ae4
Found by: stack scanning
18  buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 
rsp = 0x7fd6662f0790   rip = 0x039f1b19
Found by: stack scanning
19  
buffer-pool-test!impala::BufferPoolTest::TestRandomInternalImpl(impala::BufferPool*,
 impala::TmpFileGroup*, impala::MemTracker*, 
std::mersenne_twister_engine*, int, bool) [buffer-pool.h : 338 + 0x8]
rsp = 0x7fd6662f07a0   rip = 0x00f8721f
Found by: stack scanning
{code}

During the crash we also saw quite a few OMException from the console output.
{code}
08:46:11 
hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3__/impala-scratch-ae339172-59d6-41ef-9a6a-249c4d9ff537):

[jira] [Updated] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build

2024-06-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13165:
-
Description: 
We found from an internal build that Impala daemon crashed with a lot of 
OMException in an Ozone build.

For instance, the backend test 
[Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070]
 failed with the following stack trace collected from the generated minidump 
which is also provided in  
[^generate_junitxml.finalize.minidumps.20240616_21_41_14.xml] 
{code}
Thread 502 (crashed)
 0  libc.so.6 + 0x36387
rax = 0x   rdx = 0x0006
rcx = 0x   rbx = 0x0607d920
rsi = 0x0cfa   rdi = 0x28ec
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0428
 r8 = 0xr9 = 0x7fd6662f02e0
r10 = 0x0008   r11 = 0x0202
r12 = 0x0607d920   r13 = 0x0607d980
r14 = 0x0152   r15 = 0x0223
rip = 0x7fd77dbd1387
Found by: given as instruction pointer in context
 1  libc.so.6 + 0x37a78
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0430
rip = 0x7fd77dbd2a78
Found by: stack scanning
 2  buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, 
siginfo_t*, void*) + 0x1a0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04b8
rip = 0x03a29e40
Found by: stack scanning
 3  buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, 
int, void* (*)(unsigned long)) + 0x68
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04f0
rip = 0x03b6f858
Found by: stack scanning
 4  buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0500
rip = 0x03d07f20
Found by: stack scanning
 5  buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) [clone .part.0] + 0xad0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0558
rip = 0x039faa00
Found by: stack scanning
 6  buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0560
rip = 0x00f00e4f
Found by: stack scanning
 7  libstdc++.so.6 + 0x13aa48
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0570
rip = 0x7fd78132ea48
Found by: stack scanning
 8  libstdc++.so.6 + 0x13aa48
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0580
rip = 0x7fd78132ea48
Found by: stack scanning
 9  libstdc++.so.6 + 0x11f8e2
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05b0
rip = 0x7fd7813138e2
Found by: stack scanning
10  
buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*)
 + 0x110
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05e0
rip = 0x039f6460
Found by: stack scanning
11  buffer-pool-test!google::LogMessage::Fail() + 0xd 
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0610
rip = 0x039ef6bd
Found by: stack scanning
12  buffer-pool-test!google::LogMessage::SendToLog() + 0x244
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0620
rip = 0x039f15f4
Found by: stack scanning
13  libstdc++.so.6 + 0x12cae4
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0640
rip = 0x7fd781320ae4
Found by: stack scanning
14  buffer-pool-test!_fini + 0x19b3
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0648
rip = 0x03d0cb03
Found by: stack scanning
15  buffer-pool-test!_fini + 0xa7c14
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0658
rip = 0x03db2d64
Found by: stack scanning
16  buffer-pool-test!google::LogMessage::Flush() + 0x1ec
rsp = 0x7fd6662f06f0   rip = 0x039ef09c
Found by: stack scanning
17  libstdc++.so.6 + 0x12cae4
rsp = 0x7fd6662f0730   rip = 0x7fd781320ae4
Found by: stack scanning
18  buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 
rsp = 0x7fd6662f0790   rip = 0x039f1b19
Found by: stack scanning
19  
buffer-pool-test!impala::BufferPoolTest::TestRandomInternalImpl(impala::BufferPool*,
 impala::TmpFileGroup*, impala::MemTracker*, 
std::mersenne_twister_engine*, int, bool) [buffer-pool.h : 338 + 0x8]
rsp = 0x7fd6662f07a0   rip = 0x00f8721f
Found by: stack scanning
{code}

During the crash we also saw quite a few OMException from the console output.
{code}
08:46:11 
hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3__/impala-scratch-ae339172-59d6-41ef-9a6a-249c4d9ff537):

[jira] [Updated] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build

2024-06-18 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-13165:
-
Attachment: generate_junitxml.finalize.minidumps.20240616_21_41_14.xml

> Impala daemon crashed with OMException in Ozone build
> -
>
> Key: IMPALA-13165
> URL: https://issues.apache.org/jira/browse/IMPALA-13165
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Yida Wu
>Priority: Major
>  Labels: broken-build
> Attachments: 
> generate_junitxml.finalize.minidumps.20240616_21_41_14.xml
>
>
> We found from an internal build that Impala daemon crashed with a lot of 
> OMException in an Ozone build.
> For instance, the backend test 
> [Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070]
>  failed with the following stack trace collected from the generated minidump.
> {code}
> Thread 502 (crashed)
>  0  libc.so.6 + 0x36387
> rax = 0x   rdx = 0x0006
> rcx = 0x   rbx = 0x0607d920
> rsi = 0x0cfa   rdi = 0x28ec
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0428
>  r8 = 0xr9 = 0x7fd6662f02e0
> r10 = 0x0008   r11 = 0x0202
> r12 = 0x0607d920   r13 = 0x0607d980
> r14 = 0x0152   r15 = 0x0223
> rip = 0x7fd77dbd1387
> Found by: given as instruction pointer in context
>  1  libc.so.6 + 0x37a78
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0430
> rip = 0x7fd77dbd2a78
> Found by: stack scanning
>  2  buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, 
> siginfo_t*, void*) + 0x1a0
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04b8
> rip = 0x03a29e40
> Found by: stack scanning
>  3  buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned 
> int, int, void* (*)(unsigned long)) + 0x68
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04f0
> rip = 0x03b6f858
> Found by: stack scanning
>  4  buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0500
> rip = 0x03d07f20
> Found by: stack scanning
>  5  buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, 
> siginfo_t*, void*) [clone .part.0] + 0xad0
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0558
> rip = 0x039faa00
> Found by: stack scanning
>  6  buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0560
> rip = 0x00f00e4f
> Found by: stack scanning
>  7  libstdc++.so.6 + 0x13aa48
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0570
> rip = 0x7fd78132ea48
> Found by: stack scanning
>  8  libstdc++.so.6 + 0x13aa48
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0580
> rip = 0x7fd78132ea48
> Found by: stack scanning
>  9  libstdc++.so.6 + 0x11f8e2
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05b0
> rip = 0x7fd7813138e2
> Found by: stack scanning
> 10  
> buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*)
>  + 0x110
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05e0
> rip = 0x039f6460
> Found by: stack scanning
> 11  buffer-pool-test!google::LogMessage::Fail() + 0xd 
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0610
> rip = 0x039ef6bd
> Found by: stack scanning
> 12  buffer-pool-test!google::LogMessage::SendToLog() + 0x244
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0620
> rip = 0x039f15f4
> Found by: stack scanning
> 13  libstdc++.so.6 + 0x12cae4
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0640
> rip = 0x7fd781320ae4
> Found by: stack scanning
> 14  buffer-pool-test!_fini + 0x19b3
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0648
> rip = 0x03d0cb03
> Found by: stack scanning
> 15  buffer-pool-test!_fini + 0xa7c14
> rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0658
> rip = 0x03db2d64
> Found by: stack scanning
> 16  buffer-pool-test!google::LogMessage::Flush() + 0x1ec
> rsp = 0x7fd6662f06f0   rip = 0x039ef09c
> Found by: stack scanning
> 17  libstdc++.so.6 + 0x12cae4
> rsp = 0x7fd6662f0730   rip = 0x7fd781320ae4
> Found by: stack scanning
> 18  buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 
> rsp = 0x7fd6662f0790   rip = 0x039f1b19
> Found by: stack scanning
> 19  
> buffer-pool-test!impala::BufferPoolTest::TestRandom

[jira] [Created] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build

2024-06-18 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13165:


 Summary: Impala daemon crashed with OMException in Ozone build
 Key: IMPALA-13165
 URL: https://issues.apache.org/jira/browse/IMPALA-13165
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Yida Wu


We found from an internal build that Impala daemon crashed with a lot of 
OMException in an Ozone build.

For instance, the backend test 
[Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070]
 failed with the following stack trace collected from the generated minidump.
{code}
Thread 502 (crashed)
 0  libc.so.6 + 0x36387
rax = 0x   rdx = 0x0006
rcx = 0x   rbx = 0x0607d920
rsi = 0x0cfa   rdi = 0x28ec
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0428
 r8 = 0xr9 = 0x7fd6662f02e0
r10 = 0x0008   r11 = 0x0202
r12 = 0x0607d920   r13 = 0x0607d980
r14 = 0x0152   r15 = 0x0223
rip = 0x7fd77dbd1387
Found by: given as instruction pointer in context
 1  libc.so.6 + 0x37a78
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0430
rip = 0x7fd77dbd2a78
Found by: stack scanning
 2  buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, 
siginfo_t*, void*) + 0x1a0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04b8
rip = 0x03a29e40
Found by: stack scanning
 3  buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, 
int, void* (*)(unsigned long)) + 0x68
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f04f0
rip = 0x03b6f858
Found by: stack scanning
 4  buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0500
rip = 0x03d07f20
Found by: stack scanning
 5  buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) [clone .part.0] + 0xad0
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0558
rip = 0x039faa00
Found by: stack scanning
 6  buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0560
rip = 0x00f00e4f
Found by: stack scanning
 7  libstdc++.so.6 + 0x13aa48
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0570
rip = 0x7fd78132ea48
Found by: stack scanning
 8  libstdc++.so.6 + 0x13aa48
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0580
rip = 0x7fd78132ea48
Found by: stack scanning
 9  libstdc++.so.6 + 0x11f8e2
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05b0
rip = 0x7fd7813138e2
Found by: stack scanning
10  
buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*)
 + 0x110
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f05e0
rip = 0x039f6460
Found by: stack scanning
11  buffer-pool-test!google::LogMessage::Fail() + 0xd 
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0610
rip = 0x039ef6bd
Found by: stack scanning
12  buffer-pool-test!google::LogMessage::SendToLog() + 0x244
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0620
rip = 0x039f15f4
Found by: stack scanning
13  libstdc++.so.6 + 0x12cae4
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0640
rip = 0x7fd781320ae4
Found by: stack scanning
14  buffer-pool-test!_fini + 0x19b3
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0648
rip = 0x03d0cb03
Found by: stack scanning
15  buffer-pool-test!_fini + 0xa7c14
rbp = 0x7fd6662f06e0   rsp = 0x7fd6662f0658
rip = 0x03db2d64
Found by: stack scanning
16  buffer-pool-test!google::LogMessage::Flush() + 0x1ec
rsp = 0x7fd6662f06f0   rip = 0x039ef09c
Found by: stack scanning
17  libstdc++.so.6 + 0x12cae4
rsp = 0x7fd6662f0730   rip = 0x7fd781320ae4
Found by: stack scanning
18  buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 
rsp = 0x7fd6662f0790   rip = 0x039f1b19
Found by: stack scanning
19  
buffer-pool-test!impala::BufferPoolTest::TestRandomInternalImpl(impala::BufferPool*,
 impala::TmpFileGroup*, impala::MemTracker*, 
std::mersenne_twister_engine*, int, bool) [buffer-pool.h : 338 + 0x8]
rsp = 0x7fd6662f07a0   rip = 0x00f8721f
Found by: stack scanning
{code}

During the crash we also saw quite a few OMException from the console output.
{code}
08:46:11 
hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3__0

[jira] [Updated] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-17 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12616:
-
Labels: broken-build  (was: )

> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 4.5.0
>
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13162) test_load_data and test_drop_partition_encrypt could fail because Hadoop kms could be not connected

2024-06-17 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13162:


 Summary: test_load_data and test_drop_partition_encrypt could fail 
because Hadoop kms could be not connected
 Key: IMPALA-13162
 URL: https://issues.apache.org/jira/browse/IMPALA-13162
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We found that 
[test_load_data()|https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_encryption.py#L110]
 and 
[test_drop_partition_encrypt()|https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_encryption.py#L148]
 could fail due to the Hadoop KMS server not being able to be connected. It 
does not occur very often but it's good to create a ticket to keep track of 
this.

+*Error Message*+
{code:java}
AssertionError: Error executing hdfs crypto:  Picked up JAVA_TOOL_OPTIONS:  
-javaagent:/data/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/target/dependency/jamm-0.4.0.jar
   RemoteException: Failed to connect to: 
http://localhost:9600/kms/v1/key/testkey1/_metadataassert 2 == 0
{code}
+*Stacktrace*+
{code:java}
/data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/metadata/test_hdfs_encryption.py:124:
 in test_load_data
assert rc == 0, 'Error executing hdfs crypto: %s %s' % (stdout, stderr)
E   AssertionError: Error executing hdfs crypto:  Picked up JAVA_TOOL_OPTIONS:  
-javaagent:/data/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/target/dependency/jamm-0.4.0.jar
E RemoteException: Failed to connect to: 
http://localhost:9600/kms/v1/key/testkey1/_metadata
E 
E   assert 2 == 0
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12921) Consider adding support for locally built Ranger

2024-06-17 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12921.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> Consider adding support for locally built Ranger
> 
>
> Key: IMPALA-12921
> URL: https://issues.apache.org/jira/browse/IMPALA-12921
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> It would be nice to be able to support locally built Ranger in Impala's 
> minicluster in that it would facilitate the testing of features that require 
> changes to both components.
> *+Edit:+*
> Making the current Apache Impala on *master* (tip is
> {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support 
> Ranger on *master* (tip is 
> {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS 
> plugin) may be too ambitious.
> The signatures of some classes are already incompatible. For instance, on the 
> Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* 
> via the following code. 4 input arguments are needed.
> {code:java}
> RangerAccessRequest req = new RangerAccessRequestImpl(resource,
> SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user));
> {code}
> However, the current signature of RangerAccessRequestImpl's constructor on 
> the master of Apache Ranger is the following. It can be seen we need 5 input 
> arguments instead.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles)
> {code}
> It may be more practical to support Ranger on an earlier version, e.g., 
> [https://github.com/apache/ranger/blob/release-ranger-2.4.0].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12921) Consider adding support for locally built Ranger

2024-06-17 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12921:
-
Fix Version/s: Impala 4.5.0

> Consider adding support for locally built Ranger
> 
>
> Key: IMPALA-12921
> URL: https://issues.apache.org/jira/browse/IMPALA-12921
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> It would be nice to be able to support locally built Ranger in Impala's 
> minicluster in that it would facilitate the testing of features that require 
> changes to both components.
> *+Edit:+*
> Making the current Apache Impala on *master* (tip is
> {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support 
> Ranger on *master* (tip is 
> {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS 
> plugin) may be too ambitious.
> The signatures of some classes are already incompatible. For instance, on the 
> Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* 
> via the following code. 4 input arguments are needed.
> {code:java}
> RangerAccessRequest req = new RangerAccessRequestImpl(resource,
> SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user));
> {code}
> However, the current signature of RangerAccessRequestImpl's constructor on 
> the master of Apache Ranger is the following. It can be seen we need 5 input 
> arguments instead.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles)
> {code}
> It may be more practical to support Ranger on an earlier version, e.g., 
> [https://github.com/apache/ranger/blob/release-ranger-2.4.0].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl

2024-06-17 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12985:
-
Fix Version/s: Impala 4.5.0

> Use the new constructor when instantiating RangerAccessRequestImpl
> --
>
> Key: IMPALA-12985
> URL: https://issues.apache.org/jira/browse/IMPALA-12985
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> After RANGER-2763, we changed the signature of the class 
> RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
> as shown in the following.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles) {
> ...
> {code}
> The new signature is also provided in CDP Ranger. Thus to unblock 
> IMPALA-12921 or to be able to build Apache Impala with locally built Apache 
> Ranger, it may be faster to switch to the new signature on the Impala side 
> than waiting for RANGER-4770 to be resolved on the Ranger side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl

2024-06-17 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12985.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> Use the new constructor when instantiating RangerAccessRequestImpl
> --
>
> Key: IMPALA-12985
> URL: https://issues.apache.org/jira/browse/IMPALA-12985
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> After RANGER-2763, we changed the signature of the class 
> RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
> as shown in the following.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles) {
> ...
> {code}
> The new signature is also provided in CDP Ranger. Thus to unblock 
> IMPALA-12921 or to be able to build Apache Impala with locally built Apache 
> Ranger, it may be faster to switch to the new signature on the Impala side 
> than waiting for RANGER-4770 to be resolved on the Ranger side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-06-11 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11871.
--
Resolution: Fixed

Resolve the issue since the fix has been merged.

> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
> /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq
> rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq
> drwxrwx---+ - impala hive 0 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/_impala_insert_staging
> rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 
> /warehouse/tablespace/external/hive/t1/test{noformat}
> Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if 
> I add user impala to group supergroup on the catalogd host, this query will 
> succeed past the authorization.
> Additionally, this query does not trip up during analysis when catalog v2 is 
> enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not 
> implemented there yet and always returns null[2].
> [1] 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504]
> [2] 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298]
> ~~
> Ideally, when Ranger authorization is in place, we should:
> 1) Not check access level during analysis
> 2) Incorporate Ranger ACLs during analysis



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2024-06-09 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853512#comment-17853512
 ] 

Fang-Yu Rao commented on IMPALA-12266:
--

Encountered this failure again at 
[https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/1873/testReport/junit/query_test.test_iceberg/TestIcebergTable/test_convert_table_protocol__beeswax___exec_optiontest_replan___1___batch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/]
  in a Jenkins job against [https://gerrit.cloudera.org/c/21160/], which did 
not change Impala's behavior in this area.

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail::thread_data<>::run()
> @  0x23a03a7  thread_proxy
> @ 0x7faaad2a66ba  start_thread
> @ 0x7f2c151d  clone
> E0704 19:09:23.006968   833 catalog-server.cc:278] 
> 7145c21173f2c47b:2579db55] NullPointerExcepti

[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users

2024-05-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439
 ] 

Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:39 AM:
---

This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name is going to be 
altered.
 # For each grantee principal (which could be a user, group, or a role) in the 
policy retrieved above, invoke the REVOKE API to revoke this grantee's 
privileges on the old table (the table before the renaming) and then invoke the 
GRANT API to grant those previously revoked privileges to this grantee on the 
new table (the table with the new name). A grantee could have multiple 
privileges on the table so multiple REVOKE/GRANT API calls could be required.

It seems a bit tricky to handle the errors that occur during the 2nd step 
described above. For instance, assume that a grantee only has only one 
privilege granted on the old table, what should the catalog server do when the 
GRANT API call fails after its corresponding REVOKE API call? Should we roll 
back the REVOKE API call? Or should we retry the GRANT API call?

The policy for a table could also involve multiple principals. What should we 
do when the operation corresponding to a grantee principal fails?

 

On the other hand, there does not seem to be a Ranger API that allows us to 
retrieve the exact policy matching a given table name.
There is a Ranger API that could return an access control list (ACL) given the 
name of a resource, e.g., the table "functional.alltypes". A place where we 
call this is within RangerImpaladAuthorizationManager#getPrivileges() 
([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]),
 which could be triggered by a statement like "SHOW GRANT USER non_owner ON 
TABLE functional.alltypes".

For instance, given the table name "functional.alltypes", we could get a 
HashMap called "userACLs", and the contents of this map could look like the 
following. Note that in the following, only the first map corresponds to the 
policy in which the resource is exactly the table "functional.alltypes". This 
policy was created by an administrative user via "GRANT SELECT ON TABLE 
functional.alltypes to USER non_owner". The rest of the maps were inferred by 
other policies. Take the 2nd map, the user "hdfs" has the privileges on the 
table "functional.alltypes" through the policy that grants "hdfs" the ALL 
privilege on all the databases, tables, and columns.
 # "non_owner" -> \{"select" -> "ALLOWED"}
 # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}
 # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...}
 # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}

Tagged [~stigahuang] and [~csringhofer] here since they are also experts in 
this area on the Impala side.

Tagged [~rmani] and [~abhayk] here too since they are the experts on the Ranger 
side.


was (Author: fangyurao):
This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 

[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users

2024-05-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439
 ] 

Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:36 AM:
---

This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name is going to be 
altered.
 # For each grantee principal (which could be a user, group, or a role) in the 
policy retrieved above, invoke the REVOKE API to revoke this grantee's 
privileges on the old table (the table before the renaming) and then invoke the 
GRANT API to grant those previously revoked privileges to this grantee on the 
new table (the table with the new name). A grantee could have multiple 
privileges on the table so multiple REVOKE/GRANT API calls could be required.

It seems a bit tricky to handle the errors that occur during the 2nd step 
described above. For instance, assume that a grantee only has only one 
privilege granted on the old table, what should the catalog server do when the 
GRANT API call fails after its corresponding REVOKE API call? Should we roll 
back the REVOKE API call? Or should we retry the GRANT API call?

The policy for a table could also involve multiple principals. What should we 
do when the operation corresponding to a grantee principal fails?

 

On the other hand, there does not seem to be a Ranger API that allows us to 
retrieve the exact policy matching a given table name.
There is a Ranger API that could return an access control list (ACL) given the 
name of a resource, e.g., the table "functional.alltypes". A place where we 
call this is within RangerImpaladAuthorizationManager#getPrivileges() 
([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]),
 which could be triggered by a statement like "SHOW GRANT USER non_owner ON 
TABLE functional.alltypes".

For instance, given the table name "functional.alltypes", we could get a 
HashMap called "userACLs", and the contents of this map could look like the 
following. Note that in the following, only the first map corresponds to the 
policy in which the resource is exactly the table "functional.alltypes". This 
policy was created by an administrative user via "GRANT SELECT ON TABLE 
functional.alltypes to USER non_owner". The rest of the maps were inferred by 
other policies. Take the 2nd map, the user "hdfs" has the privileges on the 
table "functional.alltypes" through the policy that grants "hdfs" the ALL 
privilege on all the databases, tables, and columns.
 # "non_owner" -> \{"select" -> "ALLOWED"}
 # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}
 # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...}
 # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}


was (Author: fangyurao):
This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name

[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users

2024-05-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439
 ] 

Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:35 AM:
---

This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name is going to be 
altered.
 # For each grantee principal (which could be a user, group, or a role) in the 
policy retrieved above, invoke the REVOKE API to revoke this grantee's 
privileges on the old table (the table before the renaming) and then invoke the 
GRANT API to grant those previously revoked privileges to this grantee on the 
new table (the table with the new name). A grantee could have multiple 
privileges on the table so multiple REVOKE/GRANT could be required.

It seems a bit tricky to handle the errors that occur during the 2nd step 
described above. For instance, assume that a grantee only has only one 
privilege granted on the old table, what should the catalog server do when the 
GRANT API call fails after its corresponding REVOKE API call? Should we roll 
back the REVOKE API call? Or should we retry the GRANT API call?

The policy for a table could also involve multiple principals. What should we 
do when the operation corresponding to a grantee principal fails?

 

On the other hand, there does not seem to be a Ranger API that allows us to 
retrieve the exact policy matching a given table name.
There is a Ranger API that could return an access control list (ACL) given the 
name of a resource, e.g., the table "functional.alltypes". A place where we 
call this is within RangerImpaladAuthorizationManager#getPrivileges() 
([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]),
 which could be triggered by a statement like "SHOW GRANT USER non_owner ON 
TABLE functional.alltypes".

For instance, given the table name "functional.alltypes", we could get a 
HashMap called "userACLs", and the contents of this map could look like the 
following. Note that in the following, only the first map corresponds to the 
policy in which the resource is exactly the table "functional.alltypes". This 
policy was created by an administrative user via "GRANT SELECT ON TABLE 
functional.alltypes to USER non_owner". The rest of the maps were inferred by 
other policies. Take the 2nd map, the user "hdfs" has the privileges on the 
table "functional.alltypes" through the policy that grants "hdfs" the ALL 
privilege on all the databases, tables, and columns.
 # "non_owner" -> \{"select" -> "ALLOWED"}
 # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}
 # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...}
 # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}


was (Author: fangyurao):
This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name is going 

[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users

2024-05-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439
 ] 

Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:34 AM:
---

This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name is going to be 
altered.
 # For each grantee principal in the policy (which could be a user, group, or a 
role) in the policy retrieved above, invoke the REVOKE API to revoke this 
grantee's privileges on the old table (the table before the renaming) and then 
invoke the GRANT API to grant those previously revoked privileges to this 
grantee on the new table (the table with the new name). A grantee could have 
multiple privileges on the table so multiple REVOKE/GRANT could be required.

It seems a bit tricky to handle the errors that occur during the 2nd step 
described above. For instance, assume that a grantee only has only one 
privilege granted on the old table, what should the catalog server do when the 
GRANT API call fails after its corresponding REVOKE API call? Should we roll 
back the REVOKE API call? Or should we retry the GRANT API call?

The policy for a table could also involve multiple principals. What should we 
do when the operation corresponding to a grantee principal fails?

 

On the other hand, there does not seem to be a Ranger API that allows us to 
retrieve the exact policy matching a given table name.
There is a Ranger API that could return an access control list (ACL) given the 
name of a resource, e.g., the table "functional.alltypes". A place where we 
call this is within RangerImpaladAuthorizationManager#getPrivileges() 
([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]),
 which could be triggered by a statement like "SHOW GRANT USER non_owner ON 
TABLE functional.alltypes".

For instance, given the table name "functional.alltypes", we could get a 
HashMap called "userACLs", and the contents of this map could look like the 
following. Note that in the following, only the first map corresponds to the 
policy in which the resource is exactly the table "functional.alltypes". This 
policy was created by an administrative user via "GRANT SELECT ON TABLE 
functional.alltypes to USER non_owner". The rest of the maps were inferred by 
other policies. Take the 2nd map, the user "hdfs" has the privileges on the 
table "functional.alltypes" through the policy that grants "hdfs" the ALL 
privilege on all the databases, tables, and columns.
 # "non_owner" -> \{"select" -> "ALLOWED"}
 # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}
 # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...}
 # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}


was (Author: fangyurao):
This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose 

[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users

2024-05-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439
 ] 

Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 4:15 AM:
---

This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name is going to be 
altered.
 # For each grantee principal in the policy (which could be a user, group, or a 
role) in the policy retrieved above, issue a REVOKE statement to revoke this 
grantee's privileges on the old table (the table before the renaming) and then 
issue a GRANT statement to grant those previously revoked privileges to this 
grantee on the new table (the table with the new name). A grantee could have 
multiple privileges on the table so multiple REVOKE/GRANT could be required.

It seems a bit tricky to handle the errors that occur during the 2nd step 
described above. For instance, assume that a grantee only has only one 
privilege granted on the old table, what should the catalog server do when the 
GRANT command fails after its corresponding REVOKE command? Should we roll back 
the REVOKE command? Or should we retry the GRANT command?

The policy for a table could also involve multiple principals. What should we 
do when the operation corresponding to a grantee principal fails?

 

On the other hand, there does not seem to be a Ranger API that allows us to 
retrieve the exact policy matching a given table name.
There is a Ranger API that could return an access control list (ACL) given the 
name of a resource, e.g., the table "functional.alltypes". A place where we 
call this is within RangerImpaladAuthorizationManager#getPrivileges() 
([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]),
 which could be triggered by a statement like "SHOW GRANT USER non_owner ON 
TABLE functional.alltypes".

For instance, given the table name "functional.alltypes", we could get a 
HashMap called "userACLs", and the contents of this map could look like the 
following. Note that in the following, only the first map corresponds to the 
policy in which the resource is exactly the table "functional.alltypes". This 
policy was created by an administrative user via "GRANT SELECT ON TABLE 
functional.alltypes to USER non_owner". The rest of the maps were inferred by 
other policies. Take the 2nd map, the user "hdfs" has the privileges on the 
table "functional.alltypes" through the policy that grants "hdfs" the ALL 
privilege on all the databases, tables, and columns.
 # "non_owner" -> {"select" -> "ALLOWED"}
 # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}
 # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...}
 # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}


was (Author: fangyurao):
This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose

[jira] [Commented] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users

2024-05-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439
 ] 

Fang-Yu Rao commented on IMPALA-12190:
--

This JIRA does not seem to be straightforward to resolve on the Impala side 
alone because the error handling could be tricky. I think we may need Apache 
Ranger to provide an API that could take care of this for us (Apache Impala). 
Specifically, it would be great if there is a Ranger API that is able to modify 
the policies accordingly when the catalog server alters the name of a table. 
For instance, when the catalog server is executing ALTER TABLE RENAME, the 
catalog server also sends to the Ranger server via Impala's Ranger plug-in a 
request to change the name of the table in Ranger's policy repository if there 
is a policy matching this table. Ranger stores its policies in its backend 
database, so it would be much easier for Ranger to manage this operation, 
especially when there is an error/exception  that occurs during the execution 
of the operation.

 

If we'd like to resolve this from Apache Impala alone, then we have to be able 
to do the following properly.
 # Retrieve the policy matching the name of the table whose name is going to be 
altered.
 # For each grantee principal in the policy (which could be a user, group, or a 
role) in the policy retrieved above, issue a REVOKE statement to revoke this 
grantee's privileges on the old table (the table before the renaming) and then 
issue a GRANT statement to grant those previously revoked privileges to this 
grantee on the new table (the table with the new name). A grantee could have 
multiple privileges on the table so multiple REVOKE/GRANT could be required.

It seems a bit tricky to handle the errors that occur during the 2nd step 
described above. For instance, assume that a grantee only has only one 
privilege granted on the old table, what should the catalog server do when the 
GRANT command fails after its corresponding REVOKE command? Should we roll back 
the REVOKE command? Or should we retry the GRANT command?

The policy for a table could also involve multiple principals. What should we 
do when the operation corresponding to a grantee principal fails?

 

On the other hand, there does not seem to be a Ranger API that allows us to 
retrieve the exact policy matching a given table name.
There is a Ranger API that could return an access control list (ACL) given the 
name of a resource, e.g., the table "functional.alltypes". A place where we 
call this is within RangerImpaladAuthorizationManager#getPrivileges() 
([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]),
 which could be triggered by a statement like "SHOW GRANT USER non_owner ON 
TABLE functional.alltypes".

For instance, given the table name "functional.alltypes", we could get a 
HashMap called "userACLs", and the contents of this map could look like the 
following. Note that in the following, only the first map corresponds to the 
policy in which the resource is exactly the table "functional.alltypes". This 
policy was created by an administrative user via "GRANT SELECT ON TABLE 
functional.alltypes to USER non_owner". The rest of the maps were inferred by 
other policies. Take the 2nd map, the user "hdfs" has the privileges on the 
table "functional.alltypes" through the policy that grants "hdfs" the ALL 
privilege on all the databases, tables, and columns.
 # "non_owner" -> \{"select"-> "ALLOWED"}
 # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}
 # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...}
 # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...}

> Renaming table will cause losing privileges for non-admin users
> ---
>
> Key: IMPALA-12190
> URL: https://issues.apache.org/jira/browse/IMPALA-12190
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Gabor Kaszab
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: alter-table, authorization, ranger
>
> Let's say user 'a' gets some privileges on table 't'. When this table gets 
> renamed (even by user 'a') then user 'a' loses its privileges on that table.
>  
> Repro steps:
>  # Start impala with Ranger
>  # start impala-shell as admin (-u admin)
>  # create table tmp (i int, s string) stored as parquet;
>  # grant all on table tmp to user ;
>  # grant all on table tmp to user ;
> {code:java}
> Query: show grant user  on table tmp
> +++--+---++-+--+-+-+---+--+-+
> | principal_type | principal_n

[jira] [Resolved] (IMPALA-11622) Impala load data command fails when the impala user has access on source file through Ranger policy

2024-05-07 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11622.
--
Resolution: Duplicate

This is a duplicate of IMPALA-10272, which has already been resolved.

> Impala load data command fails when the impala user has access on source file 
> through Ranger policy
> ---
>
> Key: IMPALA-11622
> URL: https://issues.apache.org/jira/browse/IMPALA-11622
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek
>Priority: Major
>
> When trying to run the load data command in Impala, 
> if the Impala user has access on the source file through a Ranger HDFS policy,
> then the load data command fails.
> If the impala user has access on the source file through HDFS ACLs,
> then the load data command executes successfully.
> Steps to reproduce :-
> Ranger policy setup
> HDFS policies
> Policy 1 :-
> All access policy for HDFS user
> user - hdfs
> resources - * , recursive=true
> access - all access allowed
> Policy 2 :-
> Access for impala user on /root_test_dir/test_dir_2
> user - impala 
> resources - /root_test_dir/test_dir_2 , recursive = true
> access - all access allowed
> Hadoop SQL policies
> Policy 1 : All access policy for hrt_qa, hive and impala user
> users - hrt_qa, impala, hive
> resources - db - *, table - *, column - *
> access - all access allowed
> Policy 2 : Url policy for hrt_qa user
> users - hrt_qa
> resources :- url - *
> access - all access allowed
> Data setup :-
> In HDFS,
> create the following directories as the hdfs user
> {code:java|bgColor=#f4f5f7}
> /root_test_dir
> /root_test_dir/test_dir_1
> /root_test_dir/test_dir_2{code}
> Create a text file in local machine temp.txt with the any content ( for ex :- 
> Hello World)
> Then copy the temp.txt file to the HDFS dirs /root_test_dir/test_dir_1 and 
> /root_test_dir/test_dir_2 
> Set the ACLs for /root_test_dir/test_dir_1 to 777 recursively
> {code:java|bgColor=#f4f5f7}
> hdfs dfs -chmod -R 777 /root_test_dir/test_dir_1 {code}
>  
> Set the ACLs for /root_test_dir/test_dir_2 to 000 recursively
> {code:java|bgColor=#f4f5f7}
> hdfs dfs -chmod -R 000 /root_test_dir/test_dir_2{code}
> (Run all the hdfs commands as the hdfs user)
> In Impala-shell, as hrt_qa user
> create a test_db and create a test_table under test_db.
> {code:java|bgColor=#f4f5f7}
> CREATE TABLE test_db.test_table(c0 string) STORED AS TEXTFILE 
> TBLPROPERTIES('transactional'='false'){code}
>  
> Run the LOAD DATA command as hrt_qa user :-
> {code:java|bgColor=#f4f5f7}
> test_db> LOAD DATA INPATH '/root_test_dir/test_dir_1/temp.txt' INTO TABLE 
> test_db.test_table
>                                                            > ;
> Query: LOAD DATA INPATH '/root_test_dir/test_dir_1/temp.txt' INTO TABLE 
> test_db.test_table
> +--+
> | summary                                                  |
> +--+
> | Loaded 1 file(s). Total files in destination location: 1 |
> +--+
> Fetched 1 row(s) in 6.56s {code}
> Failing case :-
> {code:java}
> test_db> LOAD DATA INPATH '/root_test_dir/test_dir_2/temp.txt' INTO TABLE 
> test_db.test_table; Query: LOAD DATA INPATH 
> '/root_test_dir/test_dir_2/temp.txt' INTO TABLE test_db.test_table ERROR: 
> AccessControlException: Permission denied: user=impala, access=READ, 
> inode="/warehouse/tablespace/external/hive/test_db.db/test_table/.tmp_4b9b3a83-f4f9-4363-81ae-21f5c170c1bd/temp.txt":hdfs:supergroup:--
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA

2024-04-17 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838405#comment-17838405
 ] 

Fang-Yu Rao commented on IMPALA-13009:
--

Thanks for the detailed steps to reproduce the issue [~stigahuang]!

I have tried your latest script at 
https://issues.apache.org/jira/browse/IMPALA-13009?focusedCommentId=17838211&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17838211
 and found that I could also reproduce the issue after restarting only the 
Impala daemons (via "{*}bin/start-impala-cluster.py -r{*}") even though we 
don't have the command that removes the HDFS path from outside of Impala. I was 
using Apache Impala on a recent master where the tip commit is IMPALA-12996 
(Add support for DATE in Iceberg metadata tables).
{code:java}
I0417 16:06:57.716398 16131 ImpaladCatalog.java:232] Adding: 
TABLE:default.my_part version: 1723 size: 1557
I0417 16:06:57.719789 16131 ImpaladCatalog.java:232] Adding: CATALOG_SERVICE_ID 
version: 1723 size: 60
I0417 16:06:57.720358 16131 ImpaladCatalog.java:257] Adding 9 partition(s): 
HDFS_PARTITION:default.my_part:(p=1,p=2,...,p=9), versions=[1706, 1712, 1718], 
size=(avg=588, min=588, max=588, sum=5292)
E0417 16:06:57.917488 16131 ImpaladCatalog.java:264] Error adding catalog 
object: Received stale partition in a statestore update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 
00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 
61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 
30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 
78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=1, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
Java exception follows:
java.lang.IllegalStateException: Received stale partition in a statestore 
update: 
THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, 
type:TColumnType(types:[TTypeNode(type:SCALAR, 
scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], 
location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, 
file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 
00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 
00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 
61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 
30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 
78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
stats:TTableStats(num_rows:-1), is_marked_cached:false, 
hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, 
numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, 
has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
partition_name:p=1, 
hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
blockSize:0))
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:565)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
{code}

> Possible leak of partition updates when the table has failed DDL and 
> recovered by INVALIDATE METADATA
> -
>
> Key: IMPALA-13009
>

[jira] [Created] (IMPALA-12994) Revise the implementation of FsPermissionChecker to take Ranger policies into consideration

2024-04-10 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12994:


 Summary: Revise the implementation of FsPermissionChecker to take 
Ranger policies into consideration
 Key: IMPALA-12994
 URL: https://issues.apache.org/jira/browse/IMPALA-12994
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Impala's current implementation of 
[FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java]
 does not take into consideration the Ranger policies on HDFS or the underlying 
file system, which could result in unwanted AnalysisException during query 
analysis phase as reported in IMPALA-11871 and IMPALA-12291. We should consider 
revising FsPermissionChecker to consider the Ranger policies on the storage 
layer as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl

2024-04-08 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12985:
-
Description: 
After RANGER-2763, we changed the signature of the class 
RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
as shown in the following.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles) {
...
{code}
The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 
or to be able to build Apache Impala with locally built Apache Ranger, it may 
be faster to switch to the new signature on the Impala side than waiting for 
RANGER-4770 to be resolved on the Ranger side.

  was:
After RANGER-2763, we changed the signature of the class 
RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
as shown in the following.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles) {
...
{code}
The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 
or to be able to build Apache Impala with Apache Ranger, it may be faster to 
switch to the new signature on the Impala side.


> Use the new constructor when instantiating RangerAccessRequestImpl
> --
>
> Key: IMPALA-12985
> URL: https://issues.apache.org/jira/browse/IMPALA-12985
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> After RANGER-2763, we changed the signature of the class 
> RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
> as shown in the following.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles) {
> ...
> {code}
> The new signature is also provided in CDP Ranger. Thus to unblock 
> IMPALA-12921 or to be able to build Apache Impala with locally built Apache 
> Ranger, it may be faster to switch to the new signature on the Impala side 
> than waiting for RANGER-4770 to be resolved on the Ranger side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl

2024-04-08 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12985:


 Summary: Use the new constructor when instantiating 
RangerAccessRequestImpl
 Key: IMPALA-12985
 URL: https://issues.apache.org/jira/browse/IMPALA-12985
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


After RANGER-2763, we changed the signature of the class 
RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
as shown in the following.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles) {
...
{code}
The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 
or to be able to build Apache Impala with Apache Ranger, it may be faster to 
switch to the new signature on the Impala side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12921) Consider adding support for locally built Ranger

2024-04-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12921:
-
Description: 
It would be nice to be able to support locally built Ranger in Impala's 
minicluster in that it would facilitate the testing of features that require 
changes to both components.

*+Edit:+*
Making the current Apache Impala on *master* (tip is
{*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support 
Ranger on *master* (tip is 
{*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS 
plugin) may be too ambitious.

The signatures of some classes are already incompatible. For instance, on the 
Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* via 
the following code. 4 input arguments are needed.
{code:java}
RangerAccessRequest req = new RangerAccessRequestImpl(resource,
SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user));
{code}
However, the current signature of RangerAccessRequestImpl's constructor on the 
master of Apache Ranger is the following. It can be seen we need 5 input 
arguments instead.
{code:java}
public RangerAccessRequestImpl(RangerAccessResource resource, String 
accessType, String user, Set userGroups, Set userRoles)
{code}
It may be more practical to support Ranger on an earlier version, e.g., 
[https://github.com/apache/ranger/blob/release-ranger-2.4.0].

  was:It would be nice to be able to support locally built Ranger in Impala's 
minicluster in that it would facilitate the testing of features that require 
changes to both components.


> Consider adding support for locally built Ranger
> 
>
> Key: IMPALA-12921
> URL: https://issues.apache.org/jira/browse/IMPALA-12921
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> It would be nice to be able to support locally built Ranger in Impala's 
> minicluster in that it would facilitate the testing of features that require 
> changes to both components.
> *+Edit:+*
> Making the current Apache Impala on *master* (tip is
> {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support 
> Ranger on *master* (tip is 
> {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS 
> plugin) may be too ambitious.
> The signatures of some classes are already incompatible. For instance, on the 
> Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* 
> via the following code. 4 input arguments are needed.
> {code:java}
> RangerAccessRequest req = new RangerAccessRequestImpl(resource,
> SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user));
> {code}
> However, the current signature of RangerAccessRequestImpl's constructor on 
> the master of Apache Ranger is the following. It can be seen we need 5 input 
> arguments instead.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles)
> {code}
> It may be more practical to support Ranger on an earlier version, e.g., 
> [https://github.com/apache/ranger/blob/release-ranger-2.4.0].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it

2024-04-01 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12291.
--
Resolution: Duplicate

This seems to be a duplicate of IMPALA-11871. We could probably continue our 
discussion there. I will also review the patch at 
https://gerrit.cloudera.org/c/20221/ and see how we could proceed.

cc: [~khr9603], [~stigahuang], [~amansinha]

> Insert statement fails even if hdfs ranger policy allows it
> ---
>
> Key: IMPALA-12291
> URL: https://issues.apache.org/jira/browse/IMPALA-12291
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe, Security
> Environment: - Impala Version (4.1.0)
> - Ranger admin version (2.0)
> - Hive version (3.1.2)
>Reporter: halim kim
>Assignee: halim kim
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Apache Ranger is framework for providing security and authorization in hadoop 
> platform.
> Impala can also utilize apache ranger via ranger hive policy.
> The thing is that insert or some other query is not executed even If you 
> enable ranger hdfs plugin and set proper allow condition for impala query 
> excuting.
> you can see error log like below.
> {code:java}
> AnalysisException: Unable to INSERT into target table (testdb.testtable) 
> because Impala does not have WRITE access to HDFS location: 
> hdfs://testcluster/warehouse/testdb.db/testtable
> {code}
> This happens when ranger hdfs plugin is enabled but impala doesn't have 
> permission for hdfs POSIX permission. 
> For example, In the case that DB file owner, group and permission is set as 
> hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows 
> impala to execute query, Insert Query will be fail.
> In my opinion, The main cause is impala fe component doesn't check ranger 
> policy but hdfs POSIX model permissions. 
> Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272
> I'm working on resolving this issue by adding hdfs ranger policy checking 
> code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-04-01 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832957#comment-17832957
 ] 

Fang-Yu Rao commented on IMPALA-11871:
--

After reading some past JIRA's in this area, I think it should be safe to skip 
{*}analyzeWriteAccess{*}() for the *INSERT* statement (or add a startup flag to 
disable it). Before the fix is ready, we could add the following to the 
*core-site.xml* consumed by the catalog server to allow an authorized user (by 
Ranger via Impala's frontend) to insert values into an HDFS table in the 
{*}legacy catalog mode{*}. Recall that the catalog server would consider the 
service user, usually named '{*}impala{*}', as a super user as long as the user 
'{*}impala{*}' belongs to the specified super group by 
''.
{code:java}
   
dfs.permissions.superusergroup

true
  
{code}
This is still secure when Ranger is the authorization provider because of the 
following.
 # For the INSERT statement, Impala's frontend makes sure the logged-in user 
(not necessarily the service user '{*}impala{*}') is granted the necessary 
privilege on the target table. The respective audit log entry is also produced 
whether or not the query is authorized even though we skip 
{*}analyzeWriteAccess{*}().
 # For a query that has been authorized by Impala's frontend and sent to the 
backend for execution, if Impala's backend interacts with the underlying 
services, e.g., HDFS, as the service user '{*}impala{*}', then this service 
user should always be considered as a super user or a user in a super group.

 
+*Detailed Analysis*+
We started performing such a permissions checking in [IMPALA-1279: Check ACLs 
for INSERT and LOAD 
statements|https://github.com/cloudera/Impala/commit/0b32bbd899d988f1cd5c526597932b67f4c35cce]
 when we were using Sentry as authorization provider. The reason to implement 
IMPALA-1279 was also mentioned in the description of the JIRA and is excerpted 
below for easy reference. In short, we would like to fail a query as early as 
possible if there could be permissions-related issue.
{quote}Impala checks permissions for LOAD and INSERT statements before 
executing them to allow for early-exit if the query would not succeed. However, 
it does not take extended ACLs in CDH5 into account.

When a directory has restrictive Posix permissions (e.g. 000), but has an ACL 
allowing writes, Impala should allow INSERTs and LOADs to happen to that 
directory. Instead, the early check will disallow them.

If the checks were disabled, the queries would execute (or not!) correctly, 
because we delegate to libhdfs or the DistributedFileSystem API to actually 
perform the operations we need.
{quote}
We hand-crafted the permissions checker within Impala. Specifically, in our 
[implementation|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java#L206-L222],
 Hadoop ACL entries takes precedence over the POSIX permissions and we did 
*not* take into consideration the policies that could be defined on the HDFS 
path when the authorization provider is Ranger.

Due to how we implemented 
[FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java],
 it's possible that even though a logged-in user has been authorized to execute 
an INSERT statement into a table via a policy added to Ranger's repository of 
SQL, the query could fail during the analysis, simply because the service user, 
usually named '{*}impala{*}', could not pass the permissions checker. For 
instance, this could occur if the table to insert was created by another query 
engine, e.g., Hive Server2 (HS2) and thus the table is owned by another service 
user, e.g., '{*}hive{*}'. In addition, we have an ACL entry of 
"{*}group::r-x{*}" by default when the table was created. The current 
implementation of Impala's permissions checker would deny the service user 
'{*}impala{*}' of writing the table even the user '{*}impala{*}' is in the 
group of '{*}hive{*}' as shown in the following.
{code:java}
[r...@ccycloud-4.engesc24485d02.root.comops.site ~]# hdfs dfs -getfacl 

# file:  # owner: hive
# group: hive
user::rwx
group::r-x
other::r-x
 
[r...@ccycloud-4.engesc24485d02.root.comops.site impalad]# groups impala
impala : impala hive {code}
 
In 
[IMPALA-3143|https://github.com/apache/impala/commit/a0ad1868bda902fd914bc2be39eb9629a6eceb76],
 we allowed an administrator to specify the name of the super group (from 
catalog server's perspective). Once the *current user* belongs to the specified 
super group denoted via '{*}DFS_PERMISSIONS_SUPERUSERGROUP_KEY{*}' 
("{*}dfs.permissions.superusergroup{*}"), which defaulted to 
'{*}DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT{*}' ("{*}supergroup{*}"), then 
catalog server would grant the WRITE request against the corresponding table 
from the current user. Refer t

[jira] [Comment Edited] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-03-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830738#comment-17830738
 ] 

Fang-Yu Rao edited comment on IMPALA-11871 at 3/26/24 5:17 AM:
---

Hi [~MikaelSmith], my current understanding is that this is not a regression 
from earlier releases. It's more like a feature request for usability.

The method that is performing the permissions checking 
({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, 
was to make sure the Impala service has the necessary write permissions as 
early as possible, i.e., during the query analysis phase (v.s. in the query 
execution phase).

After Impala started supporting Ranger as its authorization provider, ideally, 
a cluster administrator should be able to manage the permissions on HDFS via 
either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control 
Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally 
performs the permissions-checking without checking Ranger's policy repository 
of HDFS.

IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could 
resolve this JIRA using the same approach there, where Impala's frontend calls 
*hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual 
access permissions, which could also reflect the permissions managed via 
Ranger's HDFS policy repository.


was (Author: fangyurao):
Hi [~MikaelSmith], my current understanding is that this is not a regression 
from earlier releases. It's more like a feature request for usability.

The method that is performing the permissions checking 
({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, 
was to make sure the Impala service has the necessary write permissions as 
early as possible, i.e., during the query analysis phase (v.s. in the query 
execution phase).

After Impala started supporting Ranger as its authorization provider, ideally, 
a cluster administrator should be able to manage the permissions on HDFS via 
either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control 
Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally 
performs the permissions-checking without checking Ranger's policy repository 
of HDFS.

IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could 
resolve this JIRA using the same approach there, where Impala's frontend calls 
*hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual 
access permissions, which could also reflect the permissions manged via 
Ranger's HDFS policy repository.

> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createEx

[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-03-25 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830738#comment-17830738
 ] 

Fang-Yu Rao commented on IMPALA-11871:
--

Hi [~MikaelSmith], my current understanding is that this is not a regression 
from earlier releases. It's more like a feature request for usability.

The method that is performing the permissions checking 
({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, 
was to make sure the Impala service has the necessary write permissions as 
early as possible, i.e., during the query analysis phase (v.s. in the query 
execution phase).

After Impala started supporting Ranger as its authorization provider, ideally, 
a cluster administrator should be able to manage the permissions on HDFS via 
either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control 
Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally 
performs the permissions-checking without checking Ranger's policy repository 
of HDFS.

IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could 
resolve this JIRA using the same approach there, where Impala's frontend calls 
*hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual 
access permissions, which could also reflect the permissions manged via 
Ranger's HDFS policy repository.

> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
> /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq
> rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq
> drwxrwx---+ - impala hive 0 2023-01-27 17:39 
> /warehouse/tablespace/external/hive/t1/_impala_insert_staging
> rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 
> /warehouse/tablespace/ex

[jira] [Created] (IMPALA-12921) Consider adding support for locally built Ranger

2024-03-18 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12921:


 Summary: Consider adding support for locally built Ranger
 Key: IMPALA-12921
 URL: https://issues.apache.org/jira/browse/IMPALA-12921
 Project: IMPALA
  Issue Type: Task
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


It would be nice to be able to support locally built Ranger in Impala's 
minicluster in that it would facilitate the testing of features that require 
changes to both components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819426#comment-17819426
 ] 

Fang-Yu Rao edited comment on IMPALA-12830 at 2/22/24 12:43 AM:


This issue seems to be similar to IMPALA-12170.

cc: [~stigahuang]


was (Author: fangyurao):
This issue seems to be similar to IMPALA-12170.

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code:java}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819426#comment-17819426
 ] 

Fang-Yu Rao commented on IMPALA-12830:
--

This issue seems to be similar to IMPALA-12170.

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code:java}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819425#comment-17819425
 ] 

Fang-Yu Rao commented on IMPALA-12830:
--

Hi [~skatiyal], assigned the JIRA to you since you revised the test case in 
IMPALA-9086 (Show Hive configurations in /hadoop-varz page) and thus may be 
more familiar with the context. Please feel free to re-assign as you see 
appropriate. Thanks!

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code:java}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12830) test_web_pages() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12830:


 Summary: test_web_pages() could fail in the exhaustive build
 Key: IMPALA-12830
 URL: https://issues.apache.org/jira/browse/IMPALA-12830
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Saurabh Katiyal


We found in an internal Jenkins run that test_web_pages() could fail in the 
exhaustive build with the following error.
+*Error Message*+
{code}
AssertionError: bad links from webui port 25020 assert ['/', 
'/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 diff: 
u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   -  
u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  u'/hadoop-varz',   ? 
 -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   ?  -   +  '/jmx',   -  
u'/log_level',   ?  -   +  '/log_level',   -  u'/memz',   ?  -   +  '/memz',   
-  u'/metrics',   ?  -   +  '/metrics',   -  u'/operations',   ?  -   +  
'/operations',   -  u'/profile_docs',   ?  -   +  '/profile_docs',   -  
u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  -   +  '/threadz',   -  
u'/varz']   ?  -   +  '/varz']
{code}

+*Stacktrace*+
{code}
custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
assert found_links == expected_catalog_links, msg
E   AssertionError: bad links from webui port 25020
E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
E At index 2 diff: u'/events' != '/hadoop-varz'
E Full diff:
E - [u'/',
E ?  -
E + ['/',
E -  u'/catalog',
E ?  -
E +  '/catalog',
E -  u'/events',
E -  u'/hadoop-varz',
E ?  -
E +  '/hadoop-varz',
E +  '/events',
E -  u'/jmx',
E ?  -
E +  '/jmx',
E -  u'/log_level',
E ?  -
E +  '/log_level',
E -  u'/memz',
E ?  -
E +  '/memz',
E -  u'/metrics',
E ?  -
E +  '/metrics',
E -  u'/operations',
E ?  -
E +  '/operations',
E -  u'/profile_docs',
E ?  -
E +  '/profile_docs',
E -  u'/rpcz',
E ?  -
E +  '/rpcz',
E -  u'/threadz',
E ?  -
E +  '/threadz',
E -  u'/varz']
E ?  -
E +  '/varz']
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12830:
-
Summary: test_webserver_hide_logs_link() could fail in the exhaustive build 
 (was: test_web_pages() could fail in the exhaustive build)

> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_web_pages() could fail in the 
> exhaustive build with the following error.
> +*Error Message*+
> {code}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   
> ?  -   +  '/jmx',   -  u'/log_level',   ?  -   +  '/log_level',   -  
> u'/memz',   ?  -   +  '/memz',   -  u'/metrics',   ?  -   +  '/metrics',   -  
> u'/operations',   ?  -   +  '/operations',   -  u'/profile_docs',   ?  -   +  
> '/profile_docs',   -  u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  - 
>   +  '/threadz',   -  u'/varz']   ?  -   +  '/varz']
> {code}
> +*Stacktrace*+
> {code}
> custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
> assert found_links == expected_catalog_links, msg
> E   AssertionError: bad links from webui port 25020
> E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
> E At index 2 diff: u'/events' != '/hadoop-varz'
> E Full diff:
> E - [u'/',
> E ?  -
> E + ['/',
> E -  u'/catalog',
> E ?  -
> E +  '/catalog',
> E -  u'/events',
> E -  u'/hadoop-varz',
> E ?  -
> E +  '/hadoop-varz',
> E +  '/events',
> E -  u'/jmx',
> E ?  -
> E +  '/jmx',
> E -  u'/log_level',
> E ?  -
> E +  '/log_level',
> E -  u'/memz',
> E ?  -
> E +  '/memz',
> E -  u'/metrics',
> E ?  -
> E +  '/metrics',
> E -  u'/operations',
> E ?  -
> E +  '/operations',
> E -  u'/profile_docs',
> E ?  -
> E +  '/profile_docs',
> E -  u'/rpcz',
> E ?  -
> E +  '/rpcz',
> E -  u'/threadz',
> E ?  -
> E +  '/threadz',
> E -  u'/varz']
> E ?  -
> E +  '/varz']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build

2024-02-21 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12830:
-
Description: 
We found in an internal Jenkins run that test_webserver_hide_logs_link() could 
fail in the exhaustive build with the following error.
+*Error Message*+
{code:java}
AssertionError: bad links from webui port 25020 assert ['/', 
'/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 diff: 
u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   -  
u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  u'/hadoop-varz',   ? 
 -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   ?  -   +  '/jmx',   -  
u'/log_level',   ?  -   +  '/log_level',   -  u'/memz',   ?  -   +  '/memz',   
-  u'/metrics',   ?  -   +  '/metrics',   -  u'/operations',   ?  -   +  
'/operations',   -  u'/profile_docs',   ?  -   +  '/profile_docs',   -  
u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  -   +  '/threadz',   -  
u'/varz']   ?  -   +  '/varz']
{code}
+*Stacktrace*+
{code:java}
custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
assert found_links == expected_catalog_links, msg
E   AssertionError: bad links from webui port 25020
E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
E At index 2 diff: u'/events' != '/hadoop-varz'
E Full diff:
E - [u'/',
E ?  -
E + ['/',
E -  u'/catalog',
E ?  -
E +  '/catalog',
E -  u'/events',
E -  u'/hadoop-varz',
E ?  -
E +  '/hadoop-varz',
E +  '/events',
E -  u'/jmx',
E ?  -
E +  '/jmx',
E -  u'/log_level',
E ?  -
E +  '/log_level',
E -  u'/memz',
E ?  -
E +  '/memz',
E -  u'/metrics',
E ?  -
E +  '/metrics',
E -  u'/operations',
E ?  -
E +  '/operations',
E -  u'/profile_docs',
E ?  -
E +  '/profile_docs',
E -  u'/rpcz',
E ?  -
E +  '/rpcz',
E -  u'/threadz',
E ?  -
E +  '/threadz',
E -  u'/varz']
E ?  -
E +  '/varz']
{code}

  was:
We found in an internal Jenkins run that test_web_pages() could fail in the 
exhaustive build with the following error.
+*Error Message*+
{code}
AssertionError: bad links from webui port 25020 assert ['/', 
'/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 diff: 
u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   -  
u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  u'/hadoop-varz',   ? 
 -   +  '/hadoop-varz',   +  '/events',   -  u'/jmx',   ?  -   +  '/jmx',   -  
u'/log_level',   ?  -   +  '/log_level',   -  u'/memz',   ?  -   +  '/memz',   
-  u'/metrics',   ?  -   +  '/metrics',   -  u'/operations',   ?  -   +  
'/operations',   -  u'/profile_docs',   ?  -   +  '/profile_docs',   -  
u'/rpcz',   ?  -   +  '/rpcz',   -  u'/threadz',   ?  -   +  '/threadz',   -  
u'/varz']   ?  -   +  '/varz']
{code}

+*Stacktrace*+
{code}
custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link
assert found_links == expected_catalog_links, msg
E   AssertionError: bad links from webui port 25020
E   assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]
E At index 2 diff: u'/events' != '/hadoop-varz'
E Full diff:
E - [u'/',
E ?  -
E + ['/',
E -  u'/catalog',
E ?  -
E +  '/catalog',
E -  u'/events',
E -  u'/hadoop-varz',
E ?  -
E +  '/hadoop-varz',
E +  '/events',
E -  u'/jmx',
E ?  -
E +  '/jmx',
E -  u'/log_level',
E ?  -
E +  '/log_level',
E -  u'/memz',
E ?  -
E +  '/memz',
E -  u'/metrics',
E ?  -
E +  '/metrics',
E -  u'/operations',
E ?  -
E +  '/operations',
E -  u'/profile_docs',
E ?  -
E +  '/profile_docs',
E -  u'/rpcz',
E ?  -
E +  '/rpcz',
E -  u'/threadz',
E ?  -
E +  '/threadz',
E -  u'/varz']
E ?  -
E +  '/varz']
{code}


> test_webserver_hide_logs_link() could fail in the exhaustive build
> --
>
> Key: IMPALA-12830
> URL: https://issues.apache.org/jira/browse/IMPALA-12830
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: broken-build
>
> We found in an internal Jenkins run that test_webserver_hide_logs_link() 
> could fail in the exhaustive build with the following error.
> +*Error Message*+
> {code:java}
> AssertionError: bad links from webui port 25020 assert ['/', 
> '/catal...g_level', ...] == ['/', '/catalo...g_level', ...]   At index 2 
> diff: u'/events' != '/hadoop-varz'   Full diff:   - [u'/',   ?  -   + ['/',   
> -  u'/catalog',   ?  -   +  '/catalog',   -  u'/events',   -  
> u'/hadoop-varz',   ?  -   +  '/hadoop-varz'

[jira] [Commented] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest

2024-02-17 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818215#comment-17818215
 ] 

Fang-Yu Rao commented on IMPALA-12819:
--

Hi [~MikaelSmith], assigned the JIRA to you since you helped with IMPALA-11260 
earlier and may be more familiar with the context. Please re-assign the ticket 
as you see appropriate. Thanks!


> InaccessibleObjectException found during LocalCatalogTest
> -
>
> Key: IMPALA-12819
> URL: https://issues.apache.org/jira/browse/IMPALA-12819
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.4.0
>Reporter: Fang-Yu Rao
>Assignee: Michael Smith
>Priority: Major
>  Labels: broken-build
>
> We found in an internal build that during LocalCatalogTest we could encounter 
> InaccessibleObjectException. This was found by the test 
> [test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35]
> {code:java}
> W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing 
> Ehcache from accessing the subgraph beneath 'private final 
> jdk.internal.platform.CgroupV1Metrics 
> jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be 
> underestimated as a result
> Java exception follows:
> java.lang.reflect.InaccessibleObjectException: Unable to make field private 
> final jdk.internal.platform.CgroupV1Metrics 
> jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module 
> java.base does not "opens jdk.internal.platform" to unnamed module @2c89cd7f
> at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
> at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
> at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
> at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
> at 
> org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
> at 
> org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
> at 
> org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
> at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234)
> at 
> com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043)
> at 
> com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990)
> at com.google.common.cache.LocalCache.replace(LocalCache.java:4324)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160)
> at 
> org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96)
> at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131)
> at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114)
> at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148)
> at 
> org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139)
> at 
> org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>

[jira] [Created] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest

2024-02-17 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12819:


 Summary: InaccessibleObjectException found during LocalCatalogTest
 Key: IMPALA-12819
 URL: https://issues.apache.org/jira/browse/IMPALA-12819
 Project: IMPALA
  Issue Type: Bug
  Components: fe
Affects Versions: Impala 4.4.0
Reporter: Fang-Yu Rao
Assignee: Michael Smith


We found in an internal build that during LocalCatalogTest we could encounter 
InaccessibleObjectException. This was found by the test 
[test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35]
{code:java}
W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing 
Ehcache from accessing the subgraph beneath 'private final 
jdk.internal.platform.CgroupV1Metrics 
jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be 
underestimated as a result
Java exception follows:
java.lang.reflect.InaccessibleObjectException: Unable to make field private 
final jdk.internal.platform.CgroupV1Metrics 
jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module java.base 
does not "opens jdk.internal.platform" to unnamed module @2c89cd7f
at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
at 
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
at 
java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
at 
org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
at 
org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
at org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234)
at 
com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043)
at 
com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990)
at com.google.common.cache.LocalCache.replace(LocalCache.java:4324)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160)
at 
org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96)
at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131)
at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114)
at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148)
at 
org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139)
at 
org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:316)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:240)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4

[jira] [Updated] (IMPALA-11743) Support the OWNER privilege for UDFs in Impala

2024-01-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11743:
-
Summary: Support the OWNER privilege for UDFs in Impala  (was: Investigate 
how to support the OWNER privilege for UDFs in Impala)

> Support the OWNER privilege for UDFs in Impala
> --
>
> Key: IMPALA-11743
> URL: https://issues.apache.org/jira/browse/IMPALA-11743
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently in Impala a user allowed to create a UDF in a database still has to 
> be explicitly granted the necessary privileges to execute the UDF later in a 
> SELECT query. It would be more convenient if the ownership information of a 
> UDF could also be retrieved during the query analysis of such SELECT queries 
> so that the owner/creator of a UDF will be allowed to execute the UDF without 
> being explicitly granted the necessary privileges on the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns

2024-01-05 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803733#comment-17803733
 ] 

Fang-Yu Rao commented on IMPALA-12578:
--

I separate the case of UDFs from this JIRA because currently Impala does not 
have the concept of owner with respect to UDF. According to what is seen in 
IMPALA-11743, the changes needed to support UDF ownership will be complicated 
and thus it's better to have a separate JIRA for the case of UDFs.

> Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for 
> databases, tables, and columns
> ---
>
> Key: IMPALA-12578
> URL: https://issues.apache.org/jira/browse/IMPALA-12578
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Starting from RANGER-1200, Ranger supports the notion of the OWNER user, 
> which allows each user to perform any operation on the resources owned by it. 
> This avoids the need for creating a new policy that grants the OWNER user the 
> privileges on every newly created  resource. Refer to 
> [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].
> Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
> of the resource to the Ranger plug-in and thus a non-administrative user 
> could not grant/revoke privileges on a resource to/from another user even 
> though this non-administrative user owns the resource. We should pass the 
> ownership information to the Ranger plug-in to make authorization management 
> easier in Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs

2024-01-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12685:
-
Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE 
statements for UDFs  (was: Pass the owner user to Ranger plug-in in GRANT and 
REVOKE statements for UDF)

> Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs
> -
>
> Key: IMPALA-12685
> URL: https://issues.apache.org/jira/browse/IMPALA-12685
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> This is the follow-up to IMPALA-12578, where we tackle the cases of 
> databases, tables, and columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF

2024-01-05 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12685:


 Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE 
statements for UDF
 Key: IMPALA-12685
 URL: https://issues.apache.org/jira/browse/IMPALA-12685
 Project: IMPALA
  Issue Type: New Feature
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


This is the follow-up to IMPALA-12578, where we tackle the cases of databases, 
tables, and columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns

2024-01-05 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12578:
-
Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE 
statements for databases, tables, and columns  (was: Pass the owner user to the 
Ranger plug-in in GRANT and REVOKE statements)

> Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for 
> databases, tables, and columns
> ---
>
> Key: IMPALA-12578
> URL: https://issues.apache.org/jira/browse/IMPALA-12578
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Starting from RANGER-1200, Ranger supports the notion of the OWNER user, 
> which allows each user to perform any operation on the resources owned by it. 
> This avoids the need for creating a new policy that grants the OWNER user the 
> privileges on every newly created  resource. Refer to 
> [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].
> Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
> of the resource to the Ranger plug-in and thus a non-administrative user 
> could not grant/revoke privileges on a resource to/from another user even 
> though this non-administrative user owns the resource. We should pass the 
> ownership information to the Ranger plug-in to make authorization management 
> easier in Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala

2024-01-05 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803730#comment-17803730
 ] 

Fang-Yu Rao edited comment on IMPALA-11743 at 1/6/24 12:16 AM:
---

This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger 
plug-in the owner of a resource involved in a GRANT/REVOKE statement.

Specifically, in the case when the resource is a user-defined function (UDF), 
Impala has to load this piece of information when instantiating user-defined 
functions in 
[CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836]
 so that the owner of a UDF will be available in Impala's internal 
representation of it, i.e., 
[Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java].

On a related note, in 
[hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift],
 Hive already has a field of 'ownerName' for a user-defined function.
{code:java}
struct Function {
  1: string   functionName,
  2: string   dbName,
  3: string   className,
  4: string   ownerName,
  5: PrincipalTypeownerType,
  6: i32  createTime,
  7: FunctionType functionType,
  8: list resourceUris,
  9: optional string  catName
}
{code}
 
On the other hand, when an authorized user is creating a persistent UDF via 
Impala, Impala should also pass the requesting user as the owner of the UDF to 
Hive MetaStore. This way Impala will be able to load the owner of a UDF in 
CatalogServiceCatalog.java#loadJavaFunctions() mentioned above.



was (Author: fangyurao):
This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger 
plug-in the owner of a resource involved in a GRANT/REVOKE statement.

Specifically, in the case when the resource is a user-defined function (UDF), 
Impala has to load this piece of information when instantiating user-defined 
functions in 
[CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836]
 so that the owner of a UDF will be available in Impala's internal 
representation of it, i.e., 
[Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java].

On a related note, in 
[hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift],
 Hive already has a field of 'ownerName' for a user-defined function.
{code:java}
struct Function {
  1: string   functionName,
  2: string   dbName,
  3: string   className,
  4: string   ownerName,
  5: PrincipalTypeownerType,
  6: i32  createTime,
  7: FunctionType functionType,
  8: list resourceUris,
  9: optional string  catName
}
{code}
 

> Investigate how to support the OWNER privilege for UDFs in Impala
> -
>
> Key: IMPALA-11743
> URL: https://issues.apache.org/jira/browse/IMPALA-11743
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently in Impala a user allowed to create a UDF in a database still has to 
> be explicitly granted the necessary privileges to execute the UDF later in a 
> SELECT query. It would be more convenient if the ownership information of a 
> UDF could also be retrieved during the query analysis of such SELECT queries 
> so that the owner/creator of a UDF will be allowed to execute the UDF without 
> being explicitly granted the necessary privileges on the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala

2024-01-05 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803730#comment-17803730
 ] 

Fang-Yu Rao commented on IMPALA-11743:
--

This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger 
plug-in the owner of a resource involved in a GRANT/REVOKE statement.

Specifically, in the case when the resource is a user-defined function (UDF), 
Impala has to load this piece of information when instantiating user-defined 
functions in 
[CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836]
 so that the owner of a UDF will be available in Impala's internal 
representation of it, i.e., 
[Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java].

On a related note, in 
[hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift],
 Hive already has a field of 'ownerName' for a user-defined function.
{code:java}
struct Function {
  1: string   functionName,
  2: string   dbName,
  3: string   className,
  4: string   ownerName,
  5: PrincipalTypeownerType,
  6: i32  createTime,
  7: FunctionType functionType,
  8: list resourceUris,
  9: optional string  catName
}
{code}
 

> Investigate how to support the OWNER privilege for UDFs in Impala
> -
>
> Key: IMPALA-11743
> URL: https://issues.apache.org/jira/browse/IMPALA-11743
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently in Impala a user allowed to create a UDF in a database still has to 
> be explicitly granted the necessary privileges to execute the UDF later in a 
> SELECT query. It would be more convenient if the ownership information of a 
> UDF could also be retrieved during the query analysis of such SELECT queries 
> so that the owner/creator of a UDF will be allowed to execute the UDF without 
> being explicitly granted the necessary privileges on the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-12-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reopened IMPALA-12554:
--

> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-12-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12554.
--
Resolution: Implemented

> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-12-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12554.
--
Resolution: Later

After some manual testing, we found that RANGER-4585 has some bugs, e.g., 
REVOKE REST API call is not able to revoke the privilege on multiple columns 
from a grantee that was granted the SELECT privilege on the same set of 
columns. Before this is fixed, we resolve the ticket for now and will re-open 
the ticket once this issue is fixed in a follow-up RANGER JIRA.

 

> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements

2023-11-27 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12578:
-
Description: 
Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which 
allows each user to perform any operation on the resources owned by it. This 
avoids the need for creating a new policy that grants the OWNER user the 
privileges on every newly created  resource. Refer to 
[apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].

Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
of the resource to the Ranger plug-in and thus a non-administrative user could 
not grant/revoke privileges on a resource to/from another user even though this 
non-administrative user owns the resource. We should pass the ownership 
information to the Ranger plug-in to make authorization management easier in 
Impala.

  was:
Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which 
allows each user to perform any operation on the resources owned by them. This 
avoids the need for creating a new policy that grants the OWNER user the 
privileges on every newly created  resource. Refer to 
[apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].

Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
of the resource to the Ranger plug-in and thus a non-administrative user could 
not grant/revoke privileges on a resource to/from another user even though this 
non-administrative user owns the resource. We should pass the ownership 
information to the Ranger plug-in to make authorization management easier in 
Impala.


> Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements
> 
>
> Key: IMPALA-12578
> URL: https://issues.apache.org/jira/browse/IMPALA-12578
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Starting from RANGER-1200, Ranger supports the notion of the OWNER user, 
> which allows each user to perform any operation on the resources owned by it. 
> This avoids the need for creating a new policy that grants the OWNER user the 
> privileges on every newly created  resource. Refer to 
> [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].
> Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
> of the resource to the Ranger plug-in and thus a non-administrative user 
> could not grant/revoke privileges on a resource to/from another user even 
> though this non-administrative user owns the resource. We should pass the 
> ownership information to the Ranger plug-in to make authorization management 
> easier in Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements

2023-11-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12578:


 Summary: Pass the owner user to the Ranger plug-in in GRANT and 
REVOKE statements
 Key: IMPALA-12578
 URL: https://issues.apache.org/jira/browse/IMPALA-12578
 Project: IMPALA
  Issue Type: New Feature
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which 
allows each user to perform any operation on the resources owned by them. This 
avoids the need for creating a new policy that grants the OWNER user the 
privileges on every newly created  resource. Refer to 
[apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all].

Currently for the GRANT and REVOKE statements, Impala does not pass the owner 
of the resource to the Ranger plug-in and thus a non-administrative user could 
not grant/revoke privileges on a resource to/from another user even though this 
non-administrative user owns the resource. We should pass the ownership 
information to the Ranger plug-in to make authorization management easier in 
Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-11-24 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12554:
-
Description: 
Currently Impala would create a Ranger policy for each column specified in a 
GRANT statement. For instance, after the following query, 3 Ranger policies 
would be created on the Ranger server. This could result in a lot of policies 
created when there are many columns specified and it may result in Impala's 
Ranger plug-in taking a long time to download the policies from the Ranger 
server. It would be great if Impala only creates one single policy for columns 
in the same table.
{code:java}
[localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
functional.alltypes to user non_owner;
Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to 
user non_owner
Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
Query progress can be monitored at: 
http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
+-+
| summary |
+-+
| Privilege(s) have been granted. |
+-+
Fetched 1 row(s) in 0.67s
{code}

  was:
Currently Impala would create a Ranger policy for each column specified in a 
GRANT statement. For instance, after the following query, 3 Ranger policies 
would be created on the Ranger server. This could result in a lot of policies 
created when there are many columns specified and it may cause Impala's Ranger 
plug-in a long time to download the policies from the Ranger server. It would 
be great if Impala only creates one single policy for columns in the same table.
{code}
[localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
functional.alltypes to user non_owner;
Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to 
user non_owner
Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
Query progress can be monitored at: 
http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
+-+
| summary |
+-+
| Privilege(s) have been granted. |
+-+
Fetched 1 row(s) in 0.67s
{code}


> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3268) Add command "SHOW VIEWS"

2023-11-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-3268:

Description: 
Currently to get a list of views, user has to:
 - SHOW TABLES
 - scan through the output list
 - SHOW CREATE TABLE view_name to confirm view_name is a view

which is tedious.

So I would like to request the following:
 - -SHOW TABLES should only return tables-
 - SHOW VIEWS should only return views
 - -add a flag to either above commands to return all tables and views-

This will help lots of end users.

Edit: Moved the first item and the third item out of the scope of this JIRA to 
IMPALA-12574 since more discussion may be required.

  was:
Currently to get a list of views, user has to:

- SHOW TABLES
- scan through the output list
- SHOW CREATE TABLE view_name to confirm view_name is a view

which is tedious.

So I would like to request the following:

- SHOW TABLES should only return tables
- SHOW VIEWS should only return views
- add a flag to either above commands to return all tables and views

This will help lots of end users.


> Add command "SHOW VIEWS"
> 
>
> Key: IMPALA-3268
> URL: https://issues.apache.org/jira/browse/IMPALA-3268
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0
>Reporter: Eric Lin
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: usability
>
> Currently to get a list of views, user has to:
>  - SHOW TABLES
>  - scan through the output list
>  - SHOW CREATE TABLE view_name to confirm view_name is a view
> which is tedious.
> So I would like to request the following:
>  - -SHOW TABLES should only return tables-
>  - SHOW VIEWS should only return views
>  - -add a flag to either above commands to return all tables and views-
> This will help lots of end users.
> Edit: Moved the first item and the third item out of the scope of this JIRA 
> to IMPALA-12574 since more discussion may be required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display tables

2023-11-22 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12574:
-
Summary: Consider extending SHOW TABLES statement so it only display tables 
 (was: Consider extending SHOW TABLES statement so it only display the tables)

> Consider extending SHOW TABLES statement so it only display tables
> --
>
> Key: IMPALA-12574
> URL: https://issues.apache.org/jira/browse/IMPALA-12574
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Fang-Yu Rao
>Priority: Minor
>
> IMPALA-3268 extended Frontend's API of GetTableNames() such that 
> GetTableNames() could return the matching tables whose table type is in the 
> specified set of table types. With this change, it should not be too 
> difficult to extend the SHOW TABLES statement such that SHOW TABLES could 
> display only the tables of a specified type (v.s. all types of tables). It 
> would be great to have this functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display the tables

2023-11-22 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12574:


 Summary: Consider extending SHOW TABLES statement so it only 
display the tables
 Key: IMPALA-12574
 URL: https://issues.apache.org/jira/browse/IMPALA-12574
 Project: IMPALA
  Issue Type: New Feature
  Components: Catalog, Frontend
Reporter: Fang-Yu Rao


IMPALA-3268 extended Frontend's API of GetTableNames() such that 
GetTableNames() could return the matching tables whose table type is in the 
specified set of table types. With this change, it should not be too difficult 
to extend the SHOW TABLES statement such that SHOW TABLES could display only 
the tables of a specified type (v.s. all types of tables). It would be great to 
have this functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2023-11-10 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12554:


 Summary: Create only one Ranger policy for GRANT statement
 Key: IMPALA-12554
 URL: https://issues.apache.org/jira/browse/IMPALA-12554
 Project: IMPALA
  Issue Type: Improvement
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


Currently Impala would create a Ranger policy for each column specified in a 
GRANT statement. For instance, after the following query, 3 Ranger policies 
would be created on the Ranger server. This could result in a lot of policies 
created when there are many columns specified and it may cause Impala's Ranger 
plug-in a long time to download the policies from the Ranger server. It would 
be great if Impala only creates one single policy for columns in the same table.
{code}
[localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
functional.alltypes to user non_owner;
Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to 
user non_owner
Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
Query progress can be monitored at: 
http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
+-+
| summary |
+-+
| Privilege(s) have been granted. |
+-+
Fetched 1 row(s) in 0.67s
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3268) Add command "SHOW VIEWS"

2023-11-06 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reassigned IMPALA-3268:
---

Assignee: Fang-Yu Rao

> Add command "SHOW VIEWS"
> 
>
> Key: IMPALA-3268
> URL: https://issues.apache.org/jira/browse/IMPALA-3268
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0
>Reporter: Eric Lin
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: usability
>
> Currently to get a list of views, user has to:
> - SHOW TABLES
> - scan through the output list
> - SHOW CREATE TABLE view_name to confirm view_name is a view
> which is tedious.
> So I would like to request the following:
> - SHOW TABLES should only return tables
> - SHOW VIEWS should only return views
> - add a flag to either above commands to return all tables and views
> This will help lots of end users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail

2023-10-29 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780764#comment-17780764
 ] 

Fang-Yu Rao commented on IMPALA-12528:
--

Hi [~rizaon], assigned this JIRA to you since you are more familiar with the 
corresponding test. Please re-assign the ticket as you see appropriate. Thanks!

> test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail
> ---
>
> Key: IMPALA-12528
> URL: https://issues.apache.org/jira/browse/IMPALA-12528
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build, flaky-test
>
> [test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379]
>  could occassionally fail with the following error.
> *+Stacktrace+*
> {code:java}
> E   AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not 
> match expected results.
> E   EXPECTED VALUE:
> E   3
> E   
> E   
> E   ACTUAL VALUE:
> E   1
> {code}
> The corresponding test file 
> [hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test]
>  was recently added in IMPALA-12499.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail

2023-10-29 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12528:


 Summary: test_hdfs_scanner_thread_non_reserved_bytes could 
occasionally fail
 Key: IMPALA-12528
 URL: https://issues.apache.org/jira/browse/IMPALA-12528
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Riza Suminto


[test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379]
 could occassionally fail with the following error.

*+Stacktrace+*
{code:java}
E   AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not 
match expected results.
E   EXPECTED VALUE:
E   3
E   
E   
E   ACTUAL VALUE:
E   1
{code}
The corresponding test file 
[hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test]
 was recently added in IMPALA-12499.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build

2023-10-27 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780556#comment-17780556
 ] 

Fang-Yu Rao commented on IMPALA-12527:
--

Hi [~tmate], assigned the JIRA to you since you recently revised the failed 
test in IMPALA-11996 so you are more familiar with this area. Please re-assign 
the ticket as you see appropriate. Thanks!


> test_metadata_tables could occasionally fail in the s3 build
> 
>
> Key: IMPALA-12527
> URL: https://issues.apache.org/jira/browse/IMPALA-12527
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Tamas Mate
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
>  that runs 
> [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
>  could occasionally fail with the following error message.
> It looks like the actual result does not match the expected result for some 
> columns.
> Stacktrace
> {code}
> query_test/test_iceberg.py:1226: in test_metadata_tables
> '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
> common/impala_test_suite.py:751: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:587: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:487: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:296: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
>  != 
> 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
> E 
> row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
>  != 
> 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
> E 
> row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
>  != 
> 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
> E 
> row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL
>  != 
> 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL
> {code}
> Specifically, it seems the value of the second last column are different from 
> the expected value in some rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build

2023-10-27 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12527:
-
Description: 
We found that 
[test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
 that runs 
[iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
 could occasionally fail with the following error message.

It looks like the actual result does not match the expected result for some 
columns.

Stacktrace
{code}
query_test/test_iceberg.py:1226: in test_metadata_tables
'$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
common/impala_test_suite.py:751: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:587: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:487: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:296: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL
 != 
1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL
{code}

Specifically, it seems the value of the second last column are different from 
the expected value in some rows.

  was:
We found that 
[test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
 that runs 
[iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
 could occasionally fail with the following error message.

It looks like the actual result do not match the expected result for some 
columns.

Stacktrace
{code}
query_test/test_iceberg.py:1226: in test_metadata_tables
'$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
common/impala_test_suite.py:751: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:587: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:487: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:296: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ic

[jira] [Created] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build

2023-10-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12527:


 Summary: test_metadata_tables could occasionally fail in the s3 
build
 Key: IMPALA-12527
 URL: https://issues.apache.org/jira/browse/IMPALA-12527
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Tamas Mate


We found that 
[test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219]
 that runs 
[iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test]
 could occasionally fail with the following error message.

It looks like the actual result do not match the expected result for some 
columns.

Stacktrace
{code}
query_test/test_iceberg.py:1226: in test_metadata_tables
'$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
common/impala_test_suite.py:751: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:587: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:487: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:296: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0
 != 
0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0
E 
row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL
 != 
1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL
{code}

Specifically, it seems the value of the second last column are different from 
the expected value in some rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc

2023-10-27 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780524#comment-17780524
 ] 

Fang-Yu Rao commented on IMPALA-12526:
--

Hi [~stigahuang], assigned this JIRA to you since you are more familiar with 
the failed frontend test. Please re-assign the ticket as you see appropriate. 
Thanks!

> BackendConfig.INSTANCE could be null in the frontend test 
> testResetMetadataDesc
> ---
>
> Key: IMPALA-12526
> URL: https://issues.apache.org/jira/browse/IMPALA-12526
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could be null in the frontend test 
> [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65]
>  and thus 
> [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could fail with the following error.
> {code}
> Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because 
> "org.apache.impala.service.BackendConfig.INSTANCE" is null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc

2023-10-27 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780523#comment-17780523
 ] 

Fang-Yu Rao commented on IMPALA-12526:
--

This issue seems to be the same as IMPALA-11699 but I could not be completely 
sure.

> BackendConfig.INSTANCE could be null in the frontend test 
> testResetMetadataDesc
> ---
>
> Key: IMPALA-12526
> URL: https://issues.apache.org/jira/browse/IMPALA-12526
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could be null in the frontend test 
> [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65]
>  and thus 
> [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
>  could fail with the following error.
> {code}
> Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because 
> "org.apache.impala.service.BackendConfig.INSTANCE" is null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc

2023-10-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12526:


 Summary: BackendConfig.INSTANCE could be null in the frontend test 
testResetMetadataDesc
 Key: IMPALA-12526
 URL: https://issues.apache.org/jira/browse/IMPALA-12526
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Quanlong Huang


We found that 
[BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
 could be null in the frontend test 
[testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65]
 and thus 
[ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265]
 could fail with the following error.

{code}
Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because 
"org.apache.impala.service.BackendConfig.INSTANCE" is null
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12525) statestore.active-status did not reach value True in 120s

2023-10-27 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12525:


 Summary: statestore.active-status did not reach value True in 120s
 Key: IMPALA-12525
 URL: https://issues.apache.org/jira/browse/IMPALA-12525
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Wenzhe Zhou


We found that it's possible that 
[statestore.active-status|https://github.com/apache/impala/blob/master/tests/custom_cluster/test_statestored_ha.py#L452]
 could not reach value True in 120s.

*+Error Message+*
{code:java}
AssertionError: Metric statestore.active-status did not reach value True in 
120s. Dumping debug webpages in JSON format... Dumped memz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json Dumped 
metrics JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json 
Dumped queries JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json 
Dumped sessions JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json 
Dumped threadz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json 
Dumped rpcz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json Dumping 
minidumps for impalads/catalogds... Dumped minidump for Impalad PID 32539 
Dumped minidump for Impalad PID 32543 Dumped minidump for Impalad PID 32550 
Dumped minidump for Catalogd PID 32460
{code}
*+Stacktrace+*
{code:java}
custom_cluster/test_statestored_ha.py:500: in test_statestored_manual_failover
self.__test_statestored_manual_failover(second_failover=True)
custom_cluster/test_statestored_ha.py:452: in __test_statestored_manual_failover
"statestore.active-status", expected_value=True, timeout=120)
common/impala_service.py:144: in wait_for_metric_value
self.__metric_timeout_assert(metric_name, expected_value, timeout)
common/impala_service.py:213: in __metric_timeout_assert
assert 0, assert_string
E   AssertionError: Metric statestore.active-status did not reach value True in 
120s.
E   Dumping debug webpages in JSON format...
E   Dumped memz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json
E   Dumped metrics JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json
E   Dumped queries JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json
E   Dumped sessions JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json
E   Dumped threadz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json
E   Dumped rpcz JSON to 
$IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json
E   Dumping minidumps for impalads/catalogds...
E   Dumped minidump for Impalad PID 32539
E   Dumped minidump for Impalad PID 32543
E   Dumped minidump for Impalad PID 32550
E   Dumped minidump for Catalogd PID 32460
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False

2023-10-26 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-12522:
-
Priority: Critical  (was: Major)

> test_alter_table_recover could finish less than 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False
> ---
>
> Key: IMPALA-12522
> URL: https://issues.apache.org/jira/browse/IMPALA-12522
> Project: IMPALA
>  Issue Type: Test
>Reporter: Fang-Yu Rao
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky-test
>
> We found that 
> [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026]
>  could finish the execution within 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False and thus the check in the [else 
> branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12]
>  could fail. Don't know it has something to do with JDK but maybe we could 
> reduce the expected execution time a little bit to make the test less flaky.
> {code}
>   # In sync mode:
>   #  The entire DDL is processed in the exec step with delay. exec_time 
> should be
>   #  more than 10 seconds.
>   #
>   # In async mode:
>   #  The compilation of DDL is processed in the exec step without delay. 
> And the
>   #  processing of the DDL plan is in wait step with delay. The wait time 
> should
>   #  definitely take more time than 10 seconds.
>   if enable_async_ddl:
> assert(wait_time >= 10)
>   else:
> assert(exec_time >= 10)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False

2023-10-26 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780105#comment-17780105
 ] 

Fang-Yu Rao commented on IMPALA-12522:
--

Hi [~joemcdonnell], assigned this JIRA to you since you helped review 
[IMPALA-10811|https://gerrit.cloudera.org/c/17872/38/tests/metadata/test_ddl.py#1012]
 that added this test. Please reassign the JIRA as you see appropriate. Thanks!

> test_alter_table_recover could finish less than 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False
> ---
>
> Key: IMPALA-12522
> URL: https://issues.apache.org/jira/browse/IMPALA-12522
> Project: IMPALA
>  Issue Type: Test
>Reporter: Fang-Yu Rao
>Assignee: Joe McDonnell
>Priority: Major
>  Labels: broken-build, flaky-test
>
> We found that 
> [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026]
>  could finish the execution within 10 seconds with JDK 17 when 
> enable_async_ddl_execution is False and thus the check in the [else 
> branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12]
>  could fail. Don't know it has something to do with JDK but maybe we could 
> reduce the expected execution time a little bit to make the test less flaky.
> {code}
>   # In sync mode:
>   #  The entire DDL is processed in the exec step with delay. exec_time 
> should be
>   #  more than 10 seconds.
>   #
>   # In async mode:
>   #  The compilation of DDL is processed in the exec step without delay. 
> And the
>   #  processing of the DDL plan is in wait step with delay. The wait time 
> should
>   #  definitely take more time than 10 seconds.
>   if enable_async_ddl:
> assert(wait_time >= 10)
>   else:
> assert(exec_time >= 10)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False

2023-10-26 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-12522:


 Summary: test_alter_table_recover could finish less than 10 
seconds with JDK 17 when enable_async_ddl_execution is False
 Key: IMPALA-12522
 URL: https://issues.apache.org/jira/browse/IMPALA-12522
 Project: IMPALA
  Issue Type: Test
Reporter: Fang-Yu Rao
Assignee: Joe McDonnell


We found that 
[test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026]
 could finish the execution within 10 seconds with JDK 17 when 
enable_async_ddl_execution is False and thus the check in the [else 
branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12]
 could fail. Don't know it has something to do with JDK but maybe we could 
reduce the expected execution time a little bit to make the test less flaky.
{code}
  # In sync mode:
  #  The entire DDL is processed in the exec step with delay. exec_time 
should be
  #  more than 10 seconds.
  #
  # In async mode:
  #  The compilation of DDL is processed in the exec step without delay. 
And the
  #  processing of the DDL plan is in wait step with delay. The wait time 
should
  #  definitely take more time than 10 seconds.
  if enable_async_ddl:
assert(wait_time >= 10)
  else:
assert(exec_time >= 10)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky

2023-10-23 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778839#comment-17778839
 ] 

Fang-Yu Rao commented on IMPALA-12500:
--

Hi [~csringhofer], assigned this JIRA to you since you recently revised the 
test at 
[IMPALA-12430|https://github.com/apache/impala/commit/fb2d2b27641a95f51b6789639fab73b60abd7bc5#diff-a317a4067b5728a2d0af9839c1dce94710e7bd50825ceffc0a3c88aca3e27de3R553]
 and thus may be more familiar with the test. Please feel free to reassign the 
JIRA as you see fit. Thanks!

> TestObservability.test_global_exchange_counters is flaky
> 
>
> Key: IMPALA-12500
> URL: https://issues.apache.org/jira/browse/IMPALA-12500
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: broken-build, flaky
>
> There have been intermittent failures on this test with the following symptom:
> {noformat}
> query_test/test_observability.py:564: in test_global_exchange_counters
> assert "ExchangeScanRatio: 4.63" in profile
> E   assert 'ExchangeScanRatio: 4.63' in 'Query 
> (id=c04b974db37e7046:b5fe4dea):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n'
> -- executing against localhost:21000
> select count(*), sleep(50) from tpch_parquet.orders o
> inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey
> group by o.o_clerk limit 10;
> -- 2023-10-05 19:47:29,817 INFO MainThread: Started query 
> c04b974db37e7046:b5fe4dea{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky

2023-10-23 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reassigned IMPALA-12500:


Assignee: Fang-Yu Rao

> TestObservability.test_global_exchange_counters is flaky
> 
>
> Key: IMPALA-12500
> URL: https://issues.apache.org/jira/browse/IMPALA-12500
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Fang-Yu Rao
>Priority: Critical
>  Labels: broken-build, flaky
>
> There have been intermittent failures on this test with the following symptom:
> {noformat}
> query_test/test_observability.py:564: in test_global_exchange_counters
> assert "ExchangeScanRatio: 4.63" in profile
> E   assert 'ExchangeScanRatio: 4.63' in 'Query 
> (id=c04b974db37e7046:b5fe4dea):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n'
> -- executing against localhost:21000
> select count(*), sleep(50) from tpch_parquet.orders o
> inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey
> group by o.o_clerk limit 10;
> -- 2023-10-05 19:47:29,817 INFO MainThread: Started query 
> c04b974db37e7046:b5fe4dea{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky

2023-10-23 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao reassigned IMPALA-12500:


Assignee: Csaba Ringhofer  (was: Fang-Yu Rao)

> TestObservability.test_global_exchange_counters is flaky
> 
>
> Key: IMPALA-12500
> URL: https://issues.apache.org/jira/browse/IMPALA-12500
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: broken-build, flaky
>
> There have been intermittent failures on this test with the following symptom:
> {noformat}
> query_test/test_observability.py:564: in test_global_exchange_counters
> assert "ExchangeScanRatio: 4.63" in profile
> E   assert 'ExchangeScanRatio: 4.63' in 'Query 
> (id=c04b974db37e7046:b5fe4dea):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n'
> -- executing against localhost:21000
> select count(*), sleep(50) from tpch_parquet.orders o
> inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey
> group by o.o_clerk limit 10;
> -- 2023-10-05 19:47:29,817 INFO MainThread: Started query 
> c04b974db37e7046:b5fe4dea{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10712) SET OWNER ROLE of a database/table/view is not supported when Ranger is the authorization provider

2023-10-13 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775048#comment-17775048
 ] 

Fang-Yu Rao commented on IMPALA-10712:
--

It looks like I created a JIRA more than 2 years ago for the same issue.

> SET OWNER ROLE  of a database/table/view is not supported when 
> Ranger is the authorization provider
> --
>
> Key: IMPALA-10712
> URL: https://issues.apache.org/jira/browse/IMPALA-10712
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 4.0.0
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> We found that {{SET OWNER ROLE}} of a database, table, or a view is not 
> supported when Ranger is the authorization provider.
> In the case of set the owner of a database to a given role, when Ranger is 
> the authorization provider, we found that after executing {{ALTER DATABASE 
>  SET OWNER ROLE }}, we will hit the non-null check 
> for the given role at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java#L59]
>  due to the fact that the {{AuthorizationPolicy}} returned from 
> {{getAuthPolicy()}} does not cache any policy-related information if the 
> authorization provider is Ranger, which is different than the case when 
> Sentry was the authorization provider.
> When Ranger is the authorization provider, the currently existing roles are 
> cached by {{RangerImpalaPlugin}}. Therefore to address the issue above, we 
> could probably invoke {{getRoles().getRangerRoles()}} provided by the 
> {{RangerImpalaPlugin}} to retrieve the set of existing roles, similar to what 
> is done at 
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L135].
> Tagged [~joemcdonnell] and [~shajini] since I realized this when reviewing 
> Joe's comment at 
> [https://gerrit.cloudera.org/c/17469/1/docs/topics/impala_alter_database.xml#b68].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11466) Add jetty-server as an allowed dependency

2023-10-13 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11466.
--
Fix Version/s: Impala 4.3.0
   Resolution: Fixed

Resolve this JIRA since the fix has been merged thanks to [~rizaon].

> Add jetty-server as an allowed dependency
> -
>
> Key: IMPALA-11466
> URL: https://issues.apache.org/jira/browse/IMPALA-11466
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> We found after HIVE-21456, the instantiation of HiveMetaStoreClient requires 
> the class of org.eclipse.jetty.server.Connector, which is a banned dependency 
> of impala-frontend. This resulted in the failure of the FE test 
> testTestCaseImport() since it needs to instantiate a
> HiveMetaStoreClient.
> We should add the required dependency so that the test could be run.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   >