[jira] [Resolved] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-13262. -- Resolution: Fixed The fix has been merged to master. > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > Fix For: Impala 4.5.0 > > > We found that in some scenario Apache Impala > ([https://github.com/apache/impala/commit/c539874]) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=1 row-size=40B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > +---+ > {code} > > +*Edit:*+ > The following is a smaller case to reproduce the issue. The correct result > should be 0 row but Impala returns 1 row as above. > {code:java} > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 and t2.start_date=t2.end_date; > {code} > > Recall the contents of the inline view '{*}t2{*}' above is as follows. > {code:java} > +-+---+-+-++ > | dept_no | dept_rank | start_date | end_date| rn | > +-+---+-+-++ > | 1 | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1 | > | 1 | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2 | > | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3 | > +-+---+-+-++ > {code} > > On the other hand, the following query without the conjunct '{*}rn=1{*}' > returns the correct result, which is the row with '{*}rn{*}' equal to *3* > above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the > incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of > the table '{*}department{*}'. > {code:java} > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where t2.start_date=t2.end_date; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For a
[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13262: - Fix Version/s: Impala 4.5.0 > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > Fix For: Impala 4.5.0 > > > We found that in some scenario Apache Impala > ([https://github.com/apache/impala/commit/c539874]) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=1 row-size=40B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > +---+ > {code} > > +*Edit:*+ > The following is a smaller case to reproduce the issue. The correct result > should be 0 row but Impala returns 1 row as above. > {code:java} > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 and t2.start_date=t2.end_date; > {code} > > Recall the contents of the inline view '{*}t2{*}' above is as follows. > {code:java} > +-+---+-+-++ > | dept_no | dept_rank | start_date | end_date| rn | > +-+---+-+-++ > | 1 | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1 | > | 1 | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2 | > | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3 | > +-+---+-+-++ > {code} > > On the other hand, the following query without the conjunct '{*}rn=1{*}' > returns the correct result, which is the row with '{*}rn{*}' equal to *3* > above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the > incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of > the table '{*}department{*}'. > {code:java} > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where t2.start_date=t2.end_date; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875012#comment-17875012 ] Fang-Yu Rao edited comment on IMPALA-13262 at 8/19/24 11:01 PM: Thanks [~MikaelSmith]! {quote}Wouldn't that limit the ScanNode to return only the 3rd row? {quote} This is correct. If we push the predicate '{*}start_date = end_date{*}' to the scan node of the table '{*}department{*}' we will get the 3rd row as shown below. {code:java} +-+---+-+-+ | dept_no | dept_rank | start_date | end_date| +-+---+-+-+ | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | +-+---+-+-+ {code} Later on when we apply the analytic function '{*}row_number(){*}' to this returned row, this row above would be associated with the row number of 1, thus satisfying the (analytic) conjunct of '{*}rn = 1{*}' and we would get this row as a result. However, if we do not push the predicate '{*}start_date = end_date{*}' to the scan node of the table '{*}department{*}', we will get all 3 rows, on which we will apply the analytic function '{*}row_number(){*}'. This time the row that is associated with the row number of 1 is different. And this row of row number 1 does not satisfy '{*}start_date = end_date{*}' so no row would be returned. {code:java} +-+---+-+-++ | dept_no | dept_rank | start_date | end_date| rn | +-+---+-+-++ | 1 | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1 | | 1 | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2 | | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3 | +-+---+-+-++ {code} was (Author: fangyurao): Thanks [~MikaelSmith]! {quote}Wouldn't that limit the ScanNode to return only the 3rd row? {quote} This is correct. If we push the predicate '{*}start_date = end_date{*}' to the scan node of the table '{*}department{*}' we will get the 3rd row as shown below. {code:java} +-+---+-+-+ | dept_no | dept_rank | start_date | end_date| +-+---+-+-+ | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | +-+---+-+-+ {code} Later on when we apply the analytic function '{*}row_number(){*}' to this returned row, this row above would be associated with the row number of 1, thus satisfying the (analytic) conjunct of '{*}rn = 1{*}' and we would get this row as a result. However, if we do not push the predicate '{*}start_date = end_date{*}' to the scan node of the table '{*}department{*}', we will get all 3 rows, on which we will apply the analytic function '{*}row_number(){*}'. This time the row that is associated with the row number of 1 is different. And this row of row number 1 does not satisfy '{*}start_date = end_date{*}'. {code:java} +-+---+-+-++ | dept_no | dept_rank | start_date | end_date| rn | +-+---+-+-++ | 1 | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1 | | 1 | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2 | | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3 | +-+---+-+-++ {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > ([https://github.com/apache/impala/commit/c539874]) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (e
[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875012#comment-17875012 ] Fang-Yu Rao commented on IMPALA-13262: -- Thanks [~MikaelSmith]! {quote}Wouldn't that limit the ScanNode to return only the 3rd row? {quote} This is correct. If we push the predicate '{*}start_date = end_date{*}' to the scan node of the table '{*}department{*}' we will get the 3rd row as shown below. {code:java} +-+---+-+-+ | dept_no | dept_rank | start_date | end_date| +-+---+-+-+ | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | +-+---+-+-+ {code} Later on when we apply the analytic function '{*}row_number(){*}' to this returned row, this row above would be associated with the row number of 1, thus satisfying the (analytic) conjunct of '{*}rn = 1{*}' and we would get this row as a result. However, if we do not push the predicate '{*}start_date = end_date{*}' to the scan node of the table '{*}department{*}', we will get all 3 rows, on which we will apply the analytic function '{*}row_number(){*}'. This time the row that is associated with the row number of 1 is different. And this row of row number 1 does not satisfy '{*}start_date = end_date{*}'. {code:java} +-+---+-+-++ | dept_no | dept_rank | start_date | end_date| rn | +-+---+-+-++ | 1 | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1 | | 1 | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2 | | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3 | +-+---+-+-++ {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > ([https://github.com/apache/impala/commit/c539874]) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=1 row-size=40B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > +---+ > {code} > > +*Edit:*+ > The follo
[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872740#comment-17872740 ] Fang-Yu Rao commented on IMPALA-13262: -- I seemed to find a workaround which requires rewriting the conjunct '{*}start_date=end_date{*}'. In short we could try rewriting this conjunct as '{*}not start_date > end_date and not start_date < end_date{*}' assuming the values in these 2 columns are all non-nulls. I verified on [IMPALA-13252|https://github.com/apache/impala/commit/5b7ed40d52bb63a5dda0f12f83370b0fbcaaca26] that after the query rewriting, we won't have that unwanted conjunct pushed down to the scan node of the table '{*}department{*}'. {code:sql} Query: select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where not t2.start_date > t2.end_date and not t2.start_date < t2.end_date Query submitted at: 2024-08-11 15:00:52 (Coordinator: http://fangyu-upstream-dev.gce.cloudera.com:25000) Query state can be monitored at: http://fangyu-upstream-dev.gce.cloudera.com:25000/query_plan?query_id=aa4fb0f36398286a:05a46633 Fetched 0 row(s) in 0.13s {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > ([https://github.com/apache/impala/commit/c539874]) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=1 row-size=40B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > +---+ > {code} > > +*Edit:*+ > The following is a smaller case to reproduce the issue. The correct result > should be 0 row but Impala returns 1 row as above. > {code:java} > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 and t2.start_date=t2.end_date; > {code} > > Recall the contents of the inline view '{*}t2{*}' above is as follows. > {code:java} > +-+---+-+-++ > | dept_no | dept_rank | start_date | end_date| rn | > +-+---+-+-++ > |
[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872721#comment-17872721 ] Fang-Yu Rao commented on IMPALA-13262: -- Attaching a debugger to a running Impala frontend (on [IMPALA-13252|https://github.com/apache/impala/commit/5b7ed40d52bb63a5dda0f12f83370b0fbcaaca26]) using the smaller test case to reproduce the issue, I found the place where we added the conjunct '{*}start_date=end_date{*}' which in turn was pushed down to the HDFS scan node of the table '{*}department{*}' within the inline view. # Within [SingleNodePlanner#createInlineViewPlan()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1198], we call [migrateConjunctsToInlineView(analyzer, inlineViewRef)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1208]. # Within [SingleNodePlanner#migrateConjunctsToInlineView()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1374], we call [migrateOrCopyConjunctsToInlineView(analyzer, inlineViewRef, tids, analyticPreds, unassignedConjuncts)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1391] when there is an analytical predicate ('{*}rn=1{*}') to migrate into the inline view. # Within [SingleNodePlanner#migrateOrCopyConjunctsToInlineView()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1411], we called [addConjunctsIntoInlineView(analyzer, inlineViewRef, evalInInlineViewPreds)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1431]. It is within this call, we added '{*}start_date=end_date{*}' to the bound predicates for the underlying table '{*}department{*}' so that this predicate was applied to the scan node. More specifically, in [SingleNodePlanner#addConjunctsIntoInlineView()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1476], when we called [analyzer.createEquivConjuncts(inlineViewRef.getId(), preds)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1481], an additional predicate '{*}start_date=end_date{*}' would be added to the last input argument '{*}preds{*}'. Later on in the same method (SingleNodePlanner#addConjunctsIntoInlineView()), [inlineViewRef.getAnalyzer().registerConjuncts(viewPredicates)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1529] registered the conjunct '{*}start_date=end_date{*}' so that '{*}analyzer.getBoundPredicates(new TupleId(0)){*}' contains '{*}start_date=end_date{*}' that would later be used as a conjunct to be applied when the HDFS scan node for table '{*}department{*}' was created. For easy reference, a smaller test case to reproduce the issue is given in the following, which does not involve a join. {code:sql} select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 and t2.start_date=t2.end_date; {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > ([https://github.com/apache/impala/commit/c539874]) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_l
[jira] [Resolved] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
[ https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-13276. -- Resolution: Fixed > Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' > -- > > Key: IMPALA-13276 > URL: https://issues.apache.org/jira/browse/IMPALA-13276 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at > [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] > as provided in the following describes the meaning of this query option. > {code:java} > The RUNTIME_FILTER_WAIT_TIME_MS query option > adjusts the settings for the runtime filtering feature. > It specifies a time in milliseconds that each scan node waits for > runtime filters to be produced by other plan fragments. > {code} > > However the description above is not entirely accurate in that the wait time > is with respect to the time when a runtime filter was registered (within > [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) > instead of the time when a scan node is calling > [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. > For instance if a scan node started so late that when > ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since > the registration of this runtime filter was already greater than the value of > 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the > runtime filter. Refer to > [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] > for further details. > {code:java} > bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { > unique_lock l(arrival_mutex_); > while (arrival_time_.Load() == 0) { > int64_t ms_since_registration = MonotonicMillis() - registration_time_; > int64_t ms_remaining = timeout_ms - ms_since_registration; > if (ms_remaining <= 0) break; > if (injection_delay_ > 0) SleepForMs(injection_delay_); > arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); > } > return arrival_time_.Load() != 0; > } > {code} > We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
[ https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13276: - Target Version: Impala 4.4.1 > Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' > -- > > Key: IMPALA-13276 > URL: https://issues.apache.org/jira/browse/IMPALA-13276 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at > [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] > as provided in the following describes the meaning of this query option. > {code:java} > The RUNTIME_FILTER_WAIT_TIME_MS query option > adjusts the settings for the runtime filtering feature. > It specifies a time in milliseconds that each scan node waits for > runtime filters to be produced by other plan fragments. > {code} > > However the description above is not entirely accurate in that the wait time > is with respect to the time when a runtime filter was registered (within > [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) > instead of the time when a scan node is calling > [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. > For instance if a scan node started so late that when > ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since > the registration of this runtime filter was already greater than the value of > 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the > runtime filter. Refer to > [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] > for further details. > {code:java} > bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { > unique_lock l(arrival_mutex_); > while (arrival_time_.Load() == 0) { > int64_t ms_since_registration = MonotonicMillis() - registration_time_; > int64_t ms_remaining = timeout_ms - ms_since_registration; > if (ms_remaining <= 0) break; > if (injection_delay_ > 0) SleepForMs(injection_delay_); > arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); > } > return arrival_time_.Load() != 0; > } > {code} > We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
[ https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13276: - Labels: 4.4.1 (was: ) > Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' > -- > > Key: IMPALA-13276 > URL: https://issues.apache.org/jira/browse/IMPALA-13276 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: 4.4.1 > > The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at > [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] > as provided in the following describes the meaning of this query option. > {code:java} > The RUNTIME_FILTER_WAIT_TIME_MS query option > adjusts the settings for the runtime filtering feature. > It specifies a time in milliseconds that each scan node waits for > runtime filters to be produced by other plan fragments. > {code} > > However the description above is not entirely accurate in that the wait time > is with respect to the time when a runtime filter was registered (within > [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) > instead of the time when a scan node is calling > [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. > For instance if a scan node started so late that when > ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since > the registration of this runtime filter was already greater than the value of > 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the > runtime filter. Refer to > [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] > for further details. > {code:java} > bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { > unique_lock l(arrival_mutex_); > while (arrival_time_.Load() == 0) { > int64_t ms_since_registration = MonotonicMillis() - registration_time_; > int64_t ms_remaining = timeout_ms - ms_since_registration; > if (ms_remaining <= 0) break; > if (injection_delay_ > 0) SleepForMs(injection_delay_); > arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); > } > return arrival_time_.Load() != 0; > } > {code} > We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
[ https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13276: - Labels: (was: 4.4.1) > Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' > -- > > Key: IMPALA-13276 > URL: https://issues.apache.org/jira/browse/IMPALA-13276 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at > [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] > as provided in the following describes the meaning of this query option. > {code:java} > The RUNTIME_FILTER_WAIT_TIME_MS query option > adjusts the settings for the runtime filtering feature. > It specifies a time in milliseconds that each scan node waits for > runtime filters to be produced by other plan fragments. > {code} > > However the description above is not entirely accurate in that the wait time > is with respect to the time when a runtime filter was registered (within > [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) > instead of the time when a scan node is calling > [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. > For instance if a scan node started so late that when > ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since > the registration of this runtime filter was already greater than the value of > 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the > runtime filter. Refer to > [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] > for further details. > {code:java} > bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { > unique_lock l(arrival_mutex_); > while (arrival_time_.Load() == 0) { > int64_t ms_since_registration = MonotonicMillis() - registration_time_; > int64_t ms_remaining = timeout_ms - ms_since_registration; > if (ms_remaining <= 0) break; > if (injection_delay_ > 0) SleepForMs(injection_delay_); > arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); > } > return arrival_time_.Load() != 0; > } > {code} > We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
[ https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13276: - Epic Color: ghx-label-10 (was: ghx-label-11) > Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' > -- > > Key: IMPALA-13276 > URL: https://issues.apache.org/jira/browse/IMPALA-13276 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at > [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] > as provided in the following describes the meaning of this query option. > {code:java} > The RUNTIME_FILTER_WAIT_TIME_MS query option > adjusts the settings for the runtime filtering feature. > It specifies a time in milliseconds that each scan node waits for > runtime filters to be produced by other plan fragments. > {code} > > However the description above is not entirely accurate in that the wait time > is with respect to the time when a runtime filter was registered (within > [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) > instead of the time when a scan node is calling > [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. > For instance if a scan node started so late that when > ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since > the registration of this runtime filter was already greater than the value of > 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the > runtime filter. Refer to > [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] > for further details. > {code:java} > bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { > unique_lock l(arrival_mutex_); > while (arrival_time_.Load() == 0) { > int64_t ms_since_registration = MonotonicMillis() - registration_time_; > int64_t ms_remaining = timeout_ms - ms_since_registration; > if (ms_remaining <= 0) break; > if (injection_delay_ > 0) SleepForMs(injection_delay_); > arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); > } > return arrival_time_.Load() != 0; > } > {code} > We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13262: - Description: We found that in some scenario Apache Impala ([https://github.com/apache/impala/commit/c539874]) could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:sql} create database impala_13262; use impala_13262; create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following query should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date; set explain_level=2; // In the output of the EXPLAIN statement, we found that the predicate "start_data = end_date" was pushed // down to the scan node, which is wrong. | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} +*Edit:*+ The following is a smaller case to reproduce the issue. The correct result should be 0 row but Impala returns 1 row as above. {code:java} select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 and t2.start_date=t2.end_date; {code} Recall the contents of the inline view '{*}t2{*}' above is as follows. {code:java} +-+---+-+-++ | dept_no | dept_rank | start_date | end_date| rn | +-+---+-+-++ | 1 | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1 | | 1 | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2 | | 1 | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3 | +-+---+-+-++ {code} On the other hand, the following query without the conjunct '{*}rn=1{*}' returns the correct result, which is the row with '{*}rn{*}' equal to *3* above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of the table '{*}department{*}'. {code:java} select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where t2.start_date=t2.end_date; {code} was: We found that in some scenario Apache Impala (https://github.com/apache/impala/commit/c539874) could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:sql} create database impala_13262; use impala_13262; create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following query should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_d
[jira] [Comment Edited] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872477#comment-17872477 ] Fang-Yu Rao edited comment on IMPALA-13262 at 8/9/24 10:15 PM: --- I started git bisecting from [IMPALA-9132: Explain statements should not cause nullptr in LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which is not affected by the bug) and it told us that the culprit is IMPALA-9979: part 2. In addition, setting '{*}ANALYTIC_RANK_PUSHDOWN_THRESHOLD{*}' to *0* could not work around this issue. {code:java} fangyurao@fangyu:~/Impala_for_FE$ git bisect bad b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit commit b42c64993d46893488a667fb9c425548fdf964ab Author: Tim Armstrong Date: Tue Feb 2 14:02:12 2021 -0800 IMPALA-9979: part 2: partitioned top-n {code} was (Author: fangyurao): I started git bisecting from [IMPALA-9132: Explain statements should not cause nullptr in LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which is not affected by the bug) and it told us that the culprit is IMPALA-9979: part 2. {code:java} fangyurao@fangyu:~/Impala_for_FE$ git bisect bad b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit commit b42c64993d46893488a667fb9c425548fdf964ab Author: Tim Armstrong Date: Tue Feb 2 14:02:12 2021 -0800 IMPALA-9979: part 2: partitioned top-n {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > (https://github.com/apache/impala/commit/c539874) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM]
[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872477#comment-17872477 ] Fang-Yu Rao commented on IMPALA-13262: -- I started git bisecting from [IMPALA-9132: Explain statements should not cause nullptr in LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which is not affected by the bug) and it told us that the culprit is IMPALA-9979: part 2. {code:java} fangyurao@fangyu:~/Impala_for_FE$ git bisect bad b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit commit b42c64993d46893488a667fb9c425548fdf964ab Author: Tim Armstrong Date: Tue Feb 2 14:02:12 2021 -0800 IMPALA-9979: part 2: partitioned top-n {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > (https://github.com/apache/impala/commit/c539874) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=1 row-size=40B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > +---+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13250) Document ENABLED_RUNTIME_FILTER_TYPES query option
[ https://issues.apache.org/jira/browse/IMPALA-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-13250. -- Resolution: Fixed The documentation has been added. > Document ENABLED_RUNTIME_FILTER_TYPES query option > -- > > Key: IMPALA-13250 > URL: https://issues.apache.org/jira/browse/IMPALA-13250 > Project: IMPALA > Issue Type: Documentation >Affects Versions: Impala 4.0.0 >Reporter: Michael Smith >Assignee: Fang-Yu Rao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
[ https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13276: - Issue Type: Documentation (was: Task) > Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' > -- > > Key: IMPALA-13276 > URL: https://issues.apache.org/jira/browse/IMPALA-13276 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at > [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] > as provided in the following describes the meaning of this query option. > {code:java} > The RUNTIME_FILTER_WAIT_TIME_MS query option > adjusts the settings for the runtime filtering feature. > It specifies a time in milliseconds that each scan node waits for > runtime filters to be produced by other plan fragments. > {code} > > However the description above is not entirely accurate in that the wait time > is with respect to the time when a runtime filter was registered (within > [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) > instead of the time when a scan node is calling > [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. > For instance if a scan node started so late that when > ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since > the registration of this runtime filter was already greater than the value of > 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the > runtime filter. Refer to > [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] > for further details. > {code:java} > bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { > unique_lock l(arrival_mutex_); > while (arrival_time_.Load() == 0) { > int64_t ms_since_registration = MonotonicMillis() - registration_time_; > int64_t ms_remaining = timeout_ms - ms_since_registration; > if (ms_remaining <= 0) break; > if (injection_delay_ > 0) SleepForMs(injection_delay_); > arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); > } > return arrival_time_.Load() != 0; > } > {code} > We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13276) Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS'
[ https://issues.apache.org/jira/browse/IMPALA-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13276: - Summary: Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' (was: Revise the description of the query option of 'RUNTIME_FILTER_WAIT_TIME_MS') > Revise the documentation of query option 'RUNTIME_FILTER_WAIT_TIME_MS' > -- > > Key: IMPALA-13276 > URL: https://issues.apache.org/jira/browse/IMPALA-13276 > Project: IMPALA > Issue Type: Task > Components: Docs >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at > [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] > as provided in the following describes the meaning of this query option. > {code:java} > The RUNTIME_FILTER_WAIT_TIME_MS query option > adjusts the settings for the runtime filtering feature. > It specifies a time in milliseconds that each scan node waits for > runtime filters to be produced by other plan fragments. > {code} > > However the description above is not entirely accurate in that the wait time > is with respect to the time when a runtime filter was registered (within > [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) > instead of the time when a scan node is calling > [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. > For instance if a scan node started so late that when > ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since > the registration of this runtime filter was already greater than the value of > 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the > runtime filter. Refer to > [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] > for further details. > {code:java} > bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { > unique_lock l(arrival_mutex_); > while (arrival_time_.Load() == 0) { > int64_t ms_since_registration = MonotonicMillis() - registration_time_; > int64_t ms_remaining = timeout_ms - ms_since_registration; > if (ms_remaining <= 0) break; > if (injection_delay_ > 0) SleepForMs(injection_delay_); > arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); > } > return arrival_time_.Load() != 0; > } > {code} > We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13276) Revise the description of the query option of 'RUNTIME_FILTER_WAIT_TIME_MS'
Fang-Yu Rao created IMPALA-13276: Summary: Revise the description of the query option of 'RUNTIME_FILTER_WAIT_TIME_MS' Key: IMPALA-13276 URL: https://issues.apache.org/jira/browse/IMPALA-13276 Project: IMPALA Issue Type: Task Components: Docs Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao The documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' at [https://github.com/apache/impala/blob/master/docs/topics/impala_runtime_filter_wait_time_ms.xml#L37-L43] as provided in the following describes the meaning of this query option. {code:java} The RUNTIME_FILTER_WAIT_TIME_MS query option adjusts the settings for the runtime filtering feature. It specifies a time in milliseconds that each scan node waits for runtime filters to be produced by other plan fragments. {code} However the description above is not entirely accurate in that the wait time is with respect to the time when a runtime filter was registered (within [QueryState::InitFilterBank()|https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L381]) instead of the time when a scan node is calling [ScanNode::WaitForRuntimeFilters()|https://github.com/apache/impala/blob/master/be/src/exec/scan-node.cc#L212]. For instance if a scan node started so late that when ScanNode::WaitForRuntimeFilters() was called, the amount of time passed since the registration of this runtime filter was already greater than the value of 'RUNTIME_FILTER_WAIT_TIME_MS', this scan node would not be waiting for the runtime filter. Refer to [https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L86-L87] for further details. {code:java} bool RuntimeFilter::WaitForArrival(int32_t timeout_ms) const { unique_lock l(arrival_mutex_); while (arrival_time_.Load() == 0) { int64_t ms_since_registration = MonotonicMillis() - registration_time_; int64_t ms_remaining = timeout_ms - ms_since_registration; if (ms_remaining <= 0) break; if (injection_delay_ > 0) SleepForMs(injection_delay_); arrival_cv_.WaitFor(l, ms_remaining * MICROS_PER_MILLI); } return arrival_time_.Load() != 0; } {code} We should revise the documentation to make it a bit clearer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13262: - Labels: correctness (was: ) > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > (https://github.com/apache/impala/commit/c539874) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=1 row-size=40B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > +---+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13262: - Description: We found that in some scenario Apache Impala (https://github.com/apache/impala/commit/c539874) could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:sql} create database impala_13262; use impala_13262; create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following query should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date; set explain_level=2; // In the output of the EXPLAIN statement, we found that the predicate "start_data = end_date" was pushed // down to the scan node, which is wrong. | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} was: We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:sql} create database impala_13262; use impala_13262; create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following query should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date; set explain_level=2; // In the output of the EXPLAIN statement, we found that the predicate "start_data = end_date" was pushed // down to the scan node, which is wrong. | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} > Predicate pus
[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13262: - Description: We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:sql} create database impala_13262; use impala_13262; create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following query should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date; set explain_level=2; // In the output of the EXPLAIN statement, we found that the predicate "start_data = end_date" was pushed // down to the scan node, which is wrong. | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} was: We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:sql} create database impala_13262; use impala_13262; create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date set explain_level=2; // In the output of the EXPLAIN statement, we found that the predicate "start_data = end_date" was pushed // down to the scan node, which is wrong. | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} > Predicate pushdown causes incorrect results in join condition >
[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13262: - Description: We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:sql} create database impala_13262; use impala_13262; create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date set explain_level=2; // In the output of the EXPLAIN statement, we found that the predicate "start_data = end_date" was pushed // down to the scan node, which is wrong. | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} was: We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:java} create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date // | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >
[jira] [Updated] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13262: - Description: We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:java} create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date // | 01:SCAN HDFS [impala_13262.department, RANDOM] | | HDFS partitions=1/1 files=3 size=132B | | predicates: start_date = end_date | | stored statistics: | | table: rows=unavailable size=unavailable | | columns: unavailable | | extrapolated-rows=disabled max-scan-range-rows=unavailable | | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=1 row-size=40B cardinality=1 | | in pipelines: 01(GETNEXT) | +---+ {code} was: We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:java} create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date {code} > Predicate pushdown causes incorrect results in join condition > - > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > We found that in some scenario Apache Impala could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:java} > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following should return 0 row. However Apache Impala produces one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date > // > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date >
[jira] [Created] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition
Fang-Yu Rao created IMPALA-13262: Summary: Predicate pushdown causes incorrect results in join condition Key: IMPALA-13262 URL: https://issues.apache.org/jira/browse/IMPALA-13262 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao We found that in some scenario Apache Impala could incorrectly push predicates to scan nodes, which in turn produces the wrong result. The following is a concrete example to reproduce the issue. {code:java} create table department ( dept_no integer, dept_rank integer, start_date timestamp,end_date timestamp); insert into department values(1,1,'2024-01-01','2024-01-02'); insert into department values(1,2,'2024-01-02','2024-01-03'); insert into department values(1,3,'2024-01-03','2024-01-03'); create table employee (employee_no integer, depart_no integer); insert into employee values (1,1); // The following should return 0 row. However Apache Impala produces one row. select * from employee t1 inner join ( select * from ( select dept_no,dept_rank,start_date,end_date ,row_number() over(partition by dept_no order by dept_rank) rn from department ) t2 where rn=1 ) t2 on t1.depart_no=t2.dept_no where t2.start_date=t2.end_date {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13169) Specify cluster id before starting HiveServer2 after HIVE-28324
[ https://issues.apache.org/jira/browse/IMPALA-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13169: - Description: After HIVE-28324, in order to start HiveServer2, it is required that the cluster id has to be passed to HiveServer2, either via the environment variable, or the command line Java property. We should provide HiveServer2 with the cluster id before we bump up CDP_BUILD_NUMBER to have a CDP Hive dependency that includes this Hive change. was: After HIVE-28324, in order to start HiveServer2, it is required that the cluster id has to be passed to HiveServer2, either via the environment variable, or the command line Java property. We should provide HiveServer2 with the cluster id before we bump up CDP_BUILD_NUMBER that includes this Hive change. > Specify cluster id before starting HiveServer2 after HIVE-28324 > --- > > Key: IMPALA-13169 > URL: https://issues.apache.org/jira/browse/IMPALA-13169 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > After HIVE-28324, in order to start HiveServer2, it is required that the > cluster id has to be passed to HiveServer2, either via the environment > variable, or the command line Java property. We should provide HiveServer2 > with the cluster id before we bump up CDP_BUILD_NUMBER to have a CDP Hive > dependency that includes this Hive change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13169) Specify cluster id before starting HiveServer2 after HIVE-28324
Fang-Yu Rao created IMPALA-13169: Summary: Specify cluster id before starting HiveServer2 after HIVE-28324 Key: IMPALA-13169 URL: https://issues.apache.org/jira/browse/IMPALA-13169 Project: IMPALA Issue Type: Task Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao After HIVE-28324, in order to start HiveServer2, it is required that the cluster id has to be passed to HiveServer2, either via the environment variable, or the command line Java property. We should provide HiveServer2 with the cluster id before we bump up CDP_BUILD_NUMBER that includes this Hive change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13167) Impala's coordinator could not be connected after a restart in custom cluster test in the ASAN build on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13167: - Description: In an internal Jenkins run, we found that it's possible that Impala's coordinator could not be connected after a restart that occurred after the coordinator hit a DCHECK during the custom cluster test in the ASAN build on ARM. Specifically, in that Jenkins run, we found that Impala's coordinator hit the DCHECK in [RuntimeProfile::EventSequence::Start(int64_t start_time_ns)|https://github.com/apache/impala/blob/master/be/src/util/runtime-profile-counters.h#L656] while running a query in [ranger_column_masking_complex_types.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test#L724-L732] that was run by [test_column_masking()|https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py#L1916]. This is a known issue as described in IMPALA-4631. Since Impala daemons and the catalog server are restarted for each test in test_ranger.py, the next test run after test_column_masking() should most likely be passed. However it did not seem like this. We found that for the following few tests (e.g., test_block_metadata_update()) in test_ranger.py, Impala's pytest framework was not able to connect to the coordinator with the following error and hence those tests failed. {code:java} -- 2024-06-18 08:49:43,350 INFO MainThread: Starting cluster with command: /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests --log_level=1 '--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger ' '--state_store_args=None ' '--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger ' --impalad_args=--default_query_options= 08:49:43 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) 08:49:43 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO 08:49:43 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO 08:49:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO 08:49:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO 08:49:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:47 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000 08:49:47 MainThread: Debug webpage not yet available: HTTPConnectionPool(host='impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com', port=25000): Max retries exceeded with url: /backends?json (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',)) 08:49:49 MainThread: Debug webpage did not become available in expected time. 08:49:49 MainThread: Waiting for num_known_live_backends=3. Current value: None 08:49:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:50 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000 08:49:50 MainThread: Waiting for num_known_live_backends=3. Current value: 0 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:51 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000 08:49:51 MainThread: num_known_live_backends has reached value: 3 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:51 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001 08:49:51 MainThread: num_known_live_backends has reached value: 3 08:49:52 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:52 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25002
[jira] [Created] (IMPALA-13167) Impala's coordinator could not be connected after a restart in custom cluster test in the ASAN build on ARM
Fang-Yu Rao created IMPALA-13167: Summary: Impala's coordinator could not be connected after a restart in custom cluster test in the ASAN build on ARM Key: IMPALA-13167 URL: https://issues.apache.org/jira/browse/IMPALA-13167 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao In an internal Jenkins run, we found that it's possible that Impala's coordinator could not be connected after a restart that occurred after the coordinator hit a DCHECK during the custom cluster test in the ASAN build on ARM. Specifically, in that Jenkins run, we found that Impala's coordinator hit the DCHECK in [RuntimeProfile::EventSequence::Start(int64_t start_time_ns)|https://github.com/apache/impala/blob/master/be/src/util/runtime-profile-counters.h#L656] while running a query in [ranger_column_masking_complex_types.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test#L724-L732] that was run by [test_column_masking()|https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py#L1916]. Since Impala daemons and the catalog server are restarted for each test in test_ranger.py, the next test run after test_column_masking() should most likely be passed. However it did not seem like this. We found that for the following few tests (e.g., test_block_metadata_update()) in test_ranger.py, Impala's pytest framework was not able to connect to the coordinator with the following error and hence those tests failed. {code:java} -- 2024-06-18 08:49:43,350 INFO MainThread: Starting cluster with command: /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests --log_level=1 '--impalad_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger ' '--state_store_args=None ' '--catalogd_args=--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger ' --impalad_args=--default_query_options= 08:49:43 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) 08:49:43 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO 08:49:43 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO 08:49:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO 08:49:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO 08:49:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:47 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000 08:49:47 MainThread: Debug webpage not yet available: HTTPConnectionPool(host='impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com', port=25000): Max retries exceeded with url: /backends?json (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',)) 08:49:49 MainThread: Debug webpage did not become available in expected time. 08:49:49 MainThread: Waiting for num_known_live_backends=3. Current value: None 08:49:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:50 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000 08:49:50 MainThread: Waiting for num_known_live_backends=3. Current value: 0 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:51 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000 08:49:51 MainThread: num_known_live_backends has reached value: 3 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 08:49:51 MainThread: Getting num_known_live_backends from impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001 08:49:51 MainThread: num_known_live_backends has reached value: 3 08:49:52 MainThread: Found 3 impa
[jira] [Updated] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build
[ https://issues.apache.org/jira/browse/IMPALA-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13165: - Description: We found from an internal build that Impala daemon crashed with a lot of OMException in an Ozone build. For instance, the backend test [Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070] failed with the following stack trace collected from the generated minidump which is also provided in [^generate_junitxml.finalize.minidumps.20240616_21_41_14.xml]. {code} Thread 502 (crashed) 0 libc.so.6 + 0x36387 rax = 0x rdx = 0x0006 rcx = 0x rbx = 0x0607d920 rsi = 0x0cfa rdi = 0x28ec rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0428 r8 = 0xr9 = 0x7fd6662f02e0 r10 = 0x0008 r11 = 0x0202 r12 = 0x0607d920 r13 = 0x0607d980 r14 = 0x0152 r15 = 0x0223 rip = 0x7fd77dbd1387 Found by: given as instruction pointer in context 1 libc.so.6 + 0x37a78 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0430 rip = 0x7fd77dbd2a78 Found by: stack scanning 2 buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, siginfo_t*, void*) + 0x1a0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04b8 rip = 0x03a29e40 Found by: stack scanning 3 buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) + 0x68 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04f0 rip = 0x03b6f858 Found by: stack scanning 4 buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0500 rip = 0x03d07f20 Found by: stack scanning 5 buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) [clone .part.0] + 0xad0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0558 rip = 0x039faa00 Found by: stack scanning 6 buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0560 rip = 0x00f00e4f Found by: stack scanning 7 libstdc++.so.6 + 0x13aa48 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0570 rip = 0x7fd78132ea48 Found by: stack scanning 8 libstdc++.so.6 + 0x13aa48 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0580 rip = 0x7fd78132ea48 Found by: stack scanning 9 libstdc++.so.6 + 0x11f8e2 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05b0 rip = 0x7fd7813138e2 Found by: stack scanning 10 buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*) + 0x110 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05e0 rip = 0x039f6460 Found by: stack scanning 11 buffer-pool-test!google::LogMessage::Fail() + 0xd rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0610 rip = 0x039ef6bd Found by: stack scanning 12 buffer-pool-test!google::LogMessage::SendToLog() + 0x244 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0620 rip = 0x039f15f4 Found by: stack scanning 13 libstdc++.so.6 + 0x12cae4 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0640 rip = 0x7fd781320ae4 Found by: stack scanning 14 buffer-pool-test!_fini + 0x19b3 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0648 rip = 0x03d0cb03 Found by: stack scanning 15 buffer-pool-test!_fini + 0xa7c14 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0658 rip = 0x03db2d64 Found by: stack scanning 16 buffer-pool-test!google::LogMessage::Flush() + 0x1ec rsp = 0x7fd6662f06f0 rip = 0x039ef09c Found by: stack scanning 17 libstdc++.so.6 + 0x12cae4 rsp = 0x7fd6662f0730 rip = 0x7fd781320ae4 Found by: stack scanning 18 buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 rsp = 0x7fd6662f0790 rip = 0x039f1b19 Found by: stack scanning 19 buffer-pool-test!impala::BufferPoolTest::TestRandomInternalImpl(impala::BufferPool*, impala::TmpFileGroup*, impala::MemTracker*, std::mersenne_twister_engine*, int, bool) [buffer-pool.h : 338 + 0x8] rsp = 0x7fd6662f07a0 rip = 0x00f8721f Found by: stack scanning {code} During the crash we also saw quite a few OMException from the console output. {code} 08:46:11 hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3__/impala-scratch-ae339172-59d6-41ef-9a6a-249c4d9ff537):
[jira] [Updated] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build
[ https://issues.apache.org/jira/browse/IMPALA-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13165: - Description: We found from an internal build that Impala daemon crashed with a lot of OMException in an Ozone build. For instance, the backend test [Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070] failed with the following stack trace collected from the generated minidump which is also provided in [^generate_junitxml.finalize.minidumps.20240616_21_41_14.xml] {code} Thread 502 (crashed) 0 libc.so.6 + 0x36387 rax = 0x rdx = 0x0006 rcx = 0x rbx = 0x0607d920 rsi = 0x0cfa rdi = 0x28ec rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0428 r8 = 0xr9 = 0x7fd6662f02e0 r10 = 0x0008 r11 = 0x0202 r12 = 0x0607d920 r13 = 0x0607d980 r14 = 0x0152 r15 = 0x0223 rip = 0x7fd77dbd1387 Found by: given as instruction pointer in context 1 libc.so.6 + 0x37a78 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0430 rip = 0x7fd77dbd2a78 Found by: stack scanning 2 buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, siginfo_t*, void*) + 0x1a0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04b8 rip = 0x03a29e40 Found by: stack scanning 3 buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) + 0x68 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04f0 rip = 0x03b6f858 Found by: stack scanning 4 buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0500 rip = 0x03d07f20 Found by: stack scanning 5 buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) [clone .part.0] + 0xad0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0558 rip = 0x039faa00 Found by: stack scanning 6 buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0560 rip = 0x00f00e4f Found by: stack scanning 7 libstdc++.so.6 + 0x13aa48 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0570 rip = 0x7fd78132ea48 Found by: stack scanning 8 libstdc++.so.6 + 0x13aa48 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0580 rip = 0x7fd78132ea48 Found by: stack scanning 9 libstdc++.so.6 + 0x11f8e2 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05b0 rip = 0x7fd7813138e2 Found by: stack scanning 10 buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*) + 0x110 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05e0 rip = 0x039f6460 Found by: stack scanning 11 buffer-pool-test!google::LogMessage::Fail() + 0xd rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0610 rip = 0x039ef6bd Found by: stack scanning 12 buffer-pool-test!google::LogMessage::SendToLog() + 0x244 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0620 rip = 0x039f15f4 Found by: stack scanning 13 libstdc++.so.6 + 0x12cae4 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0640 rip = 0x7fd781320ae4 Found by: stack scanning 14 buffer-pool-test!_fini + 0x19b3 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0648 rip = 0x03d0cb03 Found by: stack scanning 15 buffer-pool-test!_fini + 0xa7c14 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0658 rip = 0x03db2d64 Found by: stack scanning 16 buffer-pool-test!google::LogMessage::Flush() + 0x1ec rsp = 0x7fd6662f06f0 rip = 0x039ef09c Found by: stack scanning 17 libstdc++.so.6 + 0x12cae4 rsp = 0x7fd6662f0730 rip = 0x7fd781320ae4 Found by: stack scanning 18 buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 rsp = 0x7fd6662f0790 rip = 0x039f1b19 Found by: stack scanning 19 buffer-pool-test!impala::BufferPoolTest::TestRandomInternalImpl(impala::BufferPool*, impala::TmpFileGroup*, impala::MemTracker*, std::mersenne_twister_engine*, int, bool) [buffer-pool.h : 338 + 0x8] rsp = 0x7fd6662f07a0 rip = 0x00f8721f Found by: stack scanning {code} During the crash we also saw quite a few OMException from the console output. {code} 08:46:11 hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3__/impala-scratch-ae339172-59d6-41ef-9a6a-249c4d9ff537):
[jira] [Updated] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build
[ https://issues.apache.org/jira/browse/IMPALA-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-13165: - Attachment: generate_junitxml.finalize.minidumps.20240616_21_41_14.xml > Impala daemon crashed with OMException in Ozone build > - > > Key: IMPALA-13165 > URL: https://issues.apache.org/jira/browse/IMPALA-13165 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Yida Wu >Priority: Major > Labels: broken-build > Attachments: > generate_junitxml.finalize.minidumps.20240616_21_41_14.xml > > > We found from an internal build that Impala daemon crashed with a lot of > OMException in an Ozone build. > For instance, the backend test > [Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070] > failed with the following stack trace collected from the generated minidump. > {code} > Thread 502 (crashed) > 0 libc.so.6 + 0x36387 > rax = 0x rdx = 0x0006 > rcx = 0x rbx = 0x0607d920 > rsi = 0x0cfa rdi = 0x28ec > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0428 > r8 = 0xr9 = 0x7fd6662f02e0 > r10 = 0x0008 r11 = 0x0202 > r12 = 0x0607d920 r13 = 0x0607d980 > r14 = 0x0152 r15 = 0x0223 > rip = 0x7fd77dbd1387 > Found by: given as instruction pointer in context > 1 libc.so.6 + 0x37a78 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0430 > rip = 0x7fd77dbd2a78 > Found by: stack scanning > 2 buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, > siginfo_t*, void*) + 0x1a0 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04b8 > rip = 0x03a29e40 > Found by: stack scanning > 3 buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned > int, int, void* (*)(unsigned long)) + 0x68 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04f0 > rip = 0x03b6f858 > Found by: stack scanning > 4 buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0500 > rip = 0x03d07f20 > Found by: stack scanning > 5 buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, > siginfo_t*, void*) [clone .part.0] + 0xad0 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0558 > rip = 0x039faa00 > Found by: stack scanning > 6 buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0560 > rip = 0x00f00e4f > Found by: stack scanning > 7 libstdc++.so.6 + 0x13aa48 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0570 > rip = 0x7fd78132ea48 > Found by: stack scanning > 8 libstdc++.so.6 + 0x13aa48 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0580 > rip = 0x7fd78132ea48 > Found by: stack scanning > 9 libstdc++.so.6 + 0x11f8e2 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05b0 > rip = 0x7fd7813138e2 > Found by: stack scanning > 10 > buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*) > + 0x110 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05e0 > rip = 0x039f6460 > Found by: stack scanning > 11 buffer-pool-test!google::LogMessage::Fail() + 0xd > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0610 > rip = 0x039ef6bd > Found by: stack scanning > 12 buffer-pool-test!google::LogMessage::SendToLog() + 0x244 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0620 > rip = 0x039f15f4 > Found by: stack scanning > 13 libstdc++.so.6 + 0x12cae4 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0640 > rip = 0x7fd781320ae4 > Found by: stack scanning > 14 buffer-pool-test!_fini + 0x19b3 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0648 > rip = 0x03d0cb03 > Found by: stack scanning > 15 buffer-pool-test!_fini + 0xa7c14 > rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0658 > rip = 0x03db2d64 > Found by: stack scanning > 16 buffer-pool-test!google::LogMessage::Flush() + 0x1ec > rsp = 0x7fd6662f06f0 rip = 0x039ef09c > Found by: stack scanning > 17 libstdc++.so.6 + 0x12cae4 > rsp = 0x7fd6662f0730 rip = 0x7fd781320ae4 > Found by: stack scanning > 18 buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 > rsp = 0x7fd6662f0790 rip = 0x039f1b19 > Found by: stack scanning > 19 > buffer-pool-test!impala::BufferPoolTest::TestRandom
[jira] [Created] (IMPALA-13165) Impala daemon crashed with OMException in Ozone build
Fang-Yu Rao created IMPALA-13165: Summary: Impala daemon crashed with OMException in Ozone build Key: IMPALA-13165 URL: https://issues.apache.org/jira/browse/IMPALA-13165 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Yida Wu We found from an internal build that Impala daemon crashed with a lot of OMException in an Ozone build. For instance, the backend test [Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070] failed with the following stack trace collected from the generated minidump. {code} Thread 502 (crashed) 0 libc.so.6 + 0x36387 rax = 0x rdx = 0x0006 rcx = 0x rbx = 0x0607d920 rsi = 0x0cfa rdi = 0x28ec rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0428 r8 = 0xr9 = 0x7fd6662f02e0 r10 = 0x0008 r11 = 0x0202 r12 = 0x0607d920 r13 = 0x0607d980 r14 = 0x0152 r15 = 0x0223 rip = 0x7fd77dbd1387 Found by: given as instruction pointer in context 1 libc.so.6 + 0x37a78 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0430 rip = 0x7fd77dbd2a78 Found by: stack scanning 2 buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, siginfo_t*, void*) + 0x1a0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04b8 rip = 0x03a29e40 Found by: stack scanning 3 buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) + 0x68 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f04f0 rip = 0x03b6f858 Found by: stack scanning 4 buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0500 rip = 0x03d07f20 Found by: stack scanning 5 buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) [clone .part.0] + 0xad0 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0558 rip = 0x039faa00 Found by: stack scanning 6 buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0560 rip = 0x00f00e4f Found by: stack scanning 7 libstdc++.so.6 + 0x13aa48 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0570 rip = 0x7fd78132ea48 Found by: stack scanning 8 libstdc++.so.6 + 0x13aa48 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0580 rip = 0x7fd78132ea48 Found by: stack scanning 9 libstdc++.so.6 + 0x11f8e2 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05b0 rip = 0x7fd7813138e2 Found by: stack scanning 10 buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*) + 0x110 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f05e0 rip = 0x039f6460 Found by: stack scanning 11 buffer-pool-test!google::LogMessage::Fail() + 0xd rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0610 rip = 0x039ef6bd Found by: stack scanning 12 buffer-pool-test!google::LogMessage::SendToLog() + 0x244 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0620 rip = 0x039f15f4 Found by: stack scanning 13 libstdc++.so.6 + 0x12cae4 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0640 rip = 0x7fd781320ae4 Found by: stack scanning 14 buffer-pool-test!_fini + 0x19b3 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0648 rip = 0x03d0cb03 Found by: stack scanning 15 buffer-pool-test!_fini + 0xa7c14 rbp = 0x7fd6662f06e0 rsp = 0x7fd6662f0658 rip = 0x03db2d64 Found by: stack scanning 16 buffer-pool-test!google::LogMessage::Flush() + 0x1ec rsp = 0x7fd6662f06f0 rip = 0x039ef09c Found by: stack scanning 17 libstdc++.so.6 + 0x12cae4 rsp = 0x7fd6662f0730 rip = 0x7fd781320ae4 Found by: stack scanning 18 buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 rsp = 0x7fd6662f0790 rip = 0x039f1b19 Found by: stack scanning 19 buffer-pool-test!impala::BufferPoolTest::TestRandomInternalImpl(impala::BufferPool*, impala::TmpFileGroup*, impala::MemTracker*, std::mersenne_twister_engine*, int, bool) [buffer-pool.h : 338 + 0x8] rsp = 0x7fd6662f07a0 rip = 0x00f8721f Found by: stack scanning {code} During the crash we also saw quite a few OMException from the console output. {code} 08:46:11 hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3__0
[jira] [Updated] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states
[ https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12616: - Labels: broken-build (was: ) > test_restart_catalogd_while_handling_rpc_response* tests fail not reaching > expected states > -- > > Key: IMPALA-12616 > URL: https://issues.apache.org/jira/browse/IMPALA-12616 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 1.4.2 >Reporter: Andrew Sherman >Assignee: Daniel Becker >Priority: Critical > Labels: broken-build > Fix For: Impala 4.5.0 > > > There are failures in both > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout > and > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters, > both look the same: > {code:java} > custom_cluster/test_restart_services.py:232: in > test_restart_catalogd_while_handling_rpc_response_with_timeout > self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], > max_wait_time) > common/impala_test_suite.py:1181: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1199: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of > the expected states [4], last known state 5 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13162) test_load_data and test_drop_partition_encrypt could fail because Hadoop kms could be not connected
Fang-Yu Rao created IMPALA-13162: Summary: test_load_data and test_drop_partition_encrypt could fail because Hadoop kms could be not connected Key: IMPALA-13162 URL: https://issues.apache.org/jira/browse/IMPALA-13162 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao We found that [test_load_data()|https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_encryption.py#L110] and [test_drop_partition_encrypt()|https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_encryption.py#L148] could fail due to the Hadoop KMS server not being able to be connected. It does not occur very often but it's good to create a ticket to keep track of this. +*Error Message*+ {code:java} AssertionError: Error executing hdfs crypto: Picked up JAVA_TOOL_OPTIONS: -javaagent:/data/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/target/dependency/jamm-0.4.0.jar RemoteException: Failed to connect to: http://localhost:9600/kms/v1/key/testkey1/_metadataassert 2 == 0 {code} +*Stacktrace*+ {code:java} /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/metadata/test_hdfs_encryption.py:124: in test_load_data assert rc == 0, 'Error executing hdfs crypto: %s %s' % (stdout, stderr) E AssertionError: Error executing hdfs crypto: Picked up JAVA_TOOL_OPTIONS: -javaagent:/data/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/target/dependency/jamm-0.4.0.jar E RemoteException: Failed to connect to: http://localhost:9600/kms/v1/key/testkey1/_metadata E E assert 2 == 0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12921) Consider adding support for locally built Ranger
[ https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12921. -- Resolution: Fixed Resolve the issue since the fix has been merged. > Consider adding support for locally built Ranger > > > Key: IMPALA-12921 > URL: https://issues.apache.org/jira/browse/IMPALA-12921 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.5.0 > > > It would be nice to be able to support locally built Ranger in Impala's > minicluster in that it would facilitate the testing of features that require > changes to both components. > *+Edit:+* > Making the current Apache Impala on *master* (tip is > {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support > Ranger on *master* (tip is > {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS > plugin) may be too ambitious. > The signatures of some classes are already incompatible. For instance, on the > Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* > via the following code. 4 input arguments are needed. > {code:java} > RangerAccessRequest req = new RangerAccessRequestImpl(resource, > SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user)); > {code} > However, the current signature of RangerAccessRequestImpl's constructor on > the master of Apache Ranger is the following. It can be seen we need 5 input > arguments instead. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) > {code} > It may be more practical to support Ranger on an earlier version, e.g., > [https://github.com/apache/ranger/blob/release-ranger-2.4.0]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12921) Consider adding support for locally built Ranger
[ https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12921: - Fix Version/s: Impala 4.5.0 > Consider adding support for locally built Ranger > > > Key: IMPALA-12921 > URL: https://issues.apache.org/jira/browse/IMPALA-12921 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.5.0 > > > It would be nice to be able to support locally built Ranger in Impala's > minicluster in that it would facilitate the testing of features that require > changes to both components. > *+Edit:+* > Making the current Apache Impala on *master* (tip is > {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support > Ranger on *master* (tip is > {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS > plugin) may be too ambitious. > The signatures of some classes are already incompatible. For instance, on the > Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* > via the following code. 4 input arguments are needed. > {code:java} > RangerAccessRequest req = new RangerAccessRequestImpl(resource, > SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user)); > {code} > However, the current signature of RangerAccessRequestImpl's constructor on > the master of Apache Ranger is the following. It can be seen we need 5 input > arguments instead. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) > {code} > It may be more practical to support Ranger on an earlier version, e.g., > [https://github.com/apache/ranger/blob/release-ranger-2.4.0]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl
[ https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12985: - Fix Version/s: Impala 4.5.0 > Use the new constructor when instantiating RangerAccessRequestImpl > -- > > Key: IMPALA-12985 > URL: https://issues.apache.org/jira/browse/IMPALA-12985 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.5.0 > > > After RANGER-2763, we changed the signature of the class > RangerAccessRequestImpl in by adding an additional input argument 'userRoles' > as shown in the following. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) { > ... > {code} > The new signature is also provided in CDP Ranger. Thus to unblock > IMPALA-12921 or to be able to build Apache Impala with locally built Apache > Ranger, it may be faster to switch to the new signature on the Impala side > than waiting for RANGER-4770 to be resolved on the Ranger side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl
[ https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12985. -- Resolution: Fixed Resolve the issue since the fix has been merged. > Use the new constructor when instantiating RangerAccessRequestImpl > -- > > Key: IMPALA-12985 > URL: https://issues.apache.org/jira/browse/IMPALA-12985 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.5.0 > > > After RANGER-2763, we changed the signature of the class > RangerAccessRequestImpl in by adding an additional input argument 'userRoles' > as shown in the following. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) { > ... > {code} > The new signature is also provided in CDP Ranger. Thus to unblock > IMPALA-12921 or to be able to build Apache Impala with locally built Apache > Ranger, it may be faster to switch to the new signature on the Impala side > than waiting for RANGER-4770 to be resolved on the Ranger side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11871. -- Resolution: Fixed Resolve the issue since the fix has been merged. > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} > The exception occurs at analysis time, so I tested and succeeded in writing > directly into the said directory. > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz > /warehouse/tablespace/external/hive/t1/test > [root@nightly-71x-vx-3 ~]# hdfs dfs -ls > /warehouse/tablespace/external/hive/t1/ > Found 8 items > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 > /warehouse/tablespace/external/hive/t1/00_0 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 > /warehouse/tablespace/external/hive/t1/00_0_copy_1 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 > /warehouse/tablespace/external/hive/t1/00_0_copy_2 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 > /warehouse/tablespace/external/hive/t1/00_0_copy_3 > rw-rw---+ 3 impala hive 355 2023-01-27 17:17 > /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq > rw-rw---+ 3 impala hive 355 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq > drwxrwx---+ - impala hive 0 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/_impala_insert_staging > rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 > /warehouse/tablespace/external/hive/t1/test{noformat} > Reviewing the code[1], I traced the {{TAccessLevel}} to the catalogd. And if > I add user impala to group supergroup on the catalogd host, this query will > succeed past the authorization. > Additionally, this query does not trip up during analysis when catalog v2 is > enabled because the method {{getFirstLocationWithoutWriteAccess()}} is not > implemented there yet and always returns null[2]. > [1] > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504] > [2] > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298] > ~~ > Ideally, when Ranger authorization is in place, we should: > 1) Not check access level during analysis > 2) Incorporate Ranger ACLs during analysis -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853512#comment-17853512 ] Fang-Yu Rao commented on IMPALA-12266: -- Encountered this failure again at [https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/1873/testReport/junit/query_test.test_iceberg/TestIcebergTable/test_convert_table_protocol__beeswax___exec_optiontest_replan___1___batch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/] in a Jenkins job against [https://gerrit.cloudera.org/c/21160/], which did not change Impala's behavior in this area. > Sporadic failure after migrating a table to Iceberg > --- > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Gabor Kaszab >Priority: Critical > Labels: impala-iceberg > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject() > @ 0xf82151 impala::CatalogServiceProcessorT<>::dispatchCall() > @ 0xee330f apache::thrift::TDispatchProcessor::process() > @ 0x1329246 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x1315a89 impala::ThriftThread::RunRunnable() > @ 0x131773d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x195ba8c impala::Thread::SuperviseThread() > @ 0x195c895 boost::detail::thread_data<>::run() > @ 0x23a03a7 thread_proxy > @ 0x7faaad2a66ba start_thread > @ 0x7f2c151d clone > E0704 19:09:23.006968 833 catalog-server.cc:278] > 7145c21173f2c47b:2579db55] NullPointerExcepti
[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439 ] Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:39 AM: --- This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name is going to be altered. # For each grantee principal (which could be a user, group, or a role) in the policy retrieved above, invoke the REVOKE API to revoke this grantee's privileges on the old table (the table before the renaming) and then invoke the GRANT API to grant those previously revoked privileges to this grantee on the new table (the table with the new name). A grantee could have multiple privileges on the table so multiple REVOKE/GRANT API calls could be required. It seems a bit tricky to handle the errors that occur during the 2nd step described above. For instance, assume that a grantee only has only one privilege granted on the old table, what should the catalog server do when the GRANT API call fails after its corresponding REVOKE API call? Should we roll back the REVOKE API call? Or should we retry the GRANT API call? The policy for a table could also involve multiple principals. What should we do when the operation corresponding to a grantee principal fails? On the other hand, there does not seem to be a Ranger API that allows us to retrieve the exact policy matching a given table name. There is a Ranger API that could return an access control list (ACL) given the name of a resource, e.g., the table "functional.alltypes". A place where we call this is within RangerImpaladAuthorizationManager#getPrivileges() ([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]), which could be triggered by a statement like "SHOW GRANT USER non_owner ON TABLE functional.alltypes". For instance, given the table name "functional.alltypes", we could get a HashMap called "userACLs", and the contents of this map could look like the following. Note that in the following, only the first map corresponds to the policy in which the resource is exactly the table "functional.alltypes". This policy was created by an administrative user via "GRANT SELECT ON TABLE functional.alltypes to USER non_owner". The rest of the maps were inferred by other policies. Take the 2nd map, the user "hdfs" has the privileges on the table "functional.alltypes" through the policy that grants "hdfs" the ALL privilege on all the databases, tables, and columns. # "non_owner" -> \{"select" -> "ALLOWED"} # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...} # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} Tagged [~stigahuang] and [~csringhofer] here since they are also experts in this area on the Impala side. Tagged [~rmani] and [~abhayk] here too since they are the experts on the Ranger side. was (Author: fangyurao): This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution
[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439 ] Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:36 AM: --- This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name is going to be altered. # For each grantee principal (which could be a user, group, or a role) in the policy retrieved above, invoke the REVOKE API to revoke this grantee's privileges on the old table (the table before the renaming) and then invoke the GRANT API to grant those previously revoked privileges to this grantee on the new table (the table with the new name). A grantee could have multiple privileges on the table so multiple REVOKE/GRANT API calls could be required. It seems a bit tricky to handle the errors that occur during the 2nd step described above. For instance, assume that a grantee only has only one privilege granted on the old table, what should the catalog server do when the GRANT API call fails after its corresponding REVOKE API call? Should we roll back the REVOKE API call? Or should we retry the GRANT API call? The policy for a table could also involve multiple principals. What should we do when the operation corresponding to a grantee principal fails? On the other hand, there does not seem to be a Ranger API that allows us to retrieve the exact policy matching a given table name. There is a Ranger API that could return an access control list (ACL) given the name of a resource, e.g., the table "functional.alltypes". A place where we call this is within RangerImpaladAuthorizationManager#getPrivileges() ([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]), which could be triggered by a statement like "SHOW GRANT USER non_owner ON TABLE functional.alltypes". For instance, given the table name "functional.alltypes", we could get a HashMap called "userACLs", and the contents of this map could look like the following. Note that in the following, only the first map corresponds to the policy in which the resource is exactly the table "functional.alltypes". This policy was created by an administrative user via "GRANT SELECT ON TABLE functional.alltypes to USER non_owner". The rest of the maps were inferred by other policies. Take the 2nd map, the user "hdfs" has the privileges on the table "functional.alltypes" through the policy that grants "hdfs" the ALL privilege on all the databases, tables, and columns. # "non_owner" -> \{"select" -> "ALLOWED"} # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...} # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} was (Author: fangyurao): This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name
[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439 ] Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:35 AM: --- This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name is going to be altered. # For each grantee principal (which could be a user, group, or a role) in the policy retrieved above, invoke the REVOKE API to revoke this grantee's privileges on the old table (the table before the renaming) and then invoke the GRANT API to grant those previously revoked privileges to this grantee on the new table (the table with the new name). A grantee could have multiple privileges on the table so multiple REVOKE/GRANT could be required. It seems a bit tricky to handle the errors that occur during the 2nd step described above. For instance, assume that a grantee only has only one privilege granted on the old table, what should the catalog server do when the GRANT API call fails after its corresponding REVOKE API call? Should we roll back the REVOKE API call? Or should we retry the GRANT API call? The policy for a table could also involve multiple principals. What should we do when the operation corresponding to a grantee principal fails? On the other hand, there does not seem to be a Ranger API that allows us to retrieve the exact policy matching a given table name. There is a Ranger API that could return an access control list (ACL) given the name of a resource, e.g., the table "functional.alltypes". A place where we call this is within RangerImpaladAuthorizationManager#getPrivileges() ([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]), which could be triggered by a statement like "SHOW GRANT USER non_owner ON TABLE functional.alltypes". For instance, given the table name "functional.alltypes", we could get a HashMap called "userACLs", and the contents of this map could look like the following. Note that in the following, only the first map corresponds to the policy in which the resource is exactly the table "functional.alltypes". This policy was created by an administrative user via "GRANT SELECT ON TABLE functional.alltypes to USER non_owner". The rest of the maps were inferred by other policies. Take the 2nd map, the user "hdfs" has the privileges on the table "functional.alltypes" through the policy that grants "hdfs" the ALL privilege on all the databases, tables, and columns. # "non_owner" -> \{"select" -> "ALLOWED"} # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...} # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} was (Author: fangyurao): This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name is going
[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439 ] Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 6:34 AM: --- This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name is going to be altered. # For each grantee principal in the policy (which could be a user, group, or a role) in the policy retrieved above, invoke the REVOKE API to revoke this grantee's privileges on the old table (the table before the renaming) and then invoke the GRANT API to grant those previously revoked privileges to this grantee on the new table (the table with the new name). A grantee could have multiple privileges on the table so multiple REVOKE/GRANT could be required. It seems a bit tricky to handle the errors that occur during the 2nd step described above. For instance, assume that a grantee only has only one privilege granted on the old table, what should the catalog server do when the GRANT API call fails after its corresponding REVOKE API call? Should we roll back the REVOKE API call? Or should we retry the GRANT API call? The policy for a table could also involve multiple principals. What should we do when the operation corresponding to a grantee principal fails? On the other hand, there does not seem to be a Ranger API that allows us to retrieve the exact policy matching a given table name. There is a Ranger API that could return an access control list (ACL) given the name of a resource, e.g., the table "functional.alltypes". A place where we call this is within RangerImpaladAuthorizationManager#getPrivileges() ([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]), which could be triggered by a statement like "SHOW GRANT USER non_owner ON TABLE functional.alltypes". For instance, given the table name "functional.alltypes", we could get a HashMap called "userACLs", and the contents of this map could look like the following. Note that in the following, only the first map corresponds to the policy in which the resource is exactly the table "functional.alltypes". This policy was created by an administrative user via "GRANT SELECT ON TABLE functional.alltypes to USER non_owner". The rest of the maps were inferred by other policies. Take the 2nd map, the user "hdfs" has the privileges on the table "functional.alltypes" through the policy that grants "hdfs" the ALL privilege on all the databases, tables, and columns. # "non_owner" -> \{"select" -> "ALLOWED"} # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...} # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} was (Author: fangyurao): This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose
[jira] [Comment Edited] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439 ] Fang-Yu Rao edited comment on IMPALA-12190 at 5/22/24 4:15 AM: --- This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name is going to be altered. # For each grantee principal in the policy (which could be a user, group, or a role) in the policy retrieved above, issue a REVOKE statement to revoke this grantee's privileges on the old table (the table before the renaming) and then issue a GRANT statement to grant those previously revoked privileges to this grantee on the new table (the table with the new name). A grantee could have multiple privileges on the table so multiple REVOKE/GRANT could be required. It seems a bit tricky to handle the errors that occur during the 2nd step described above. For instance, assume that a grantee only has only one privilege granted on the old table, what should the catalog server do when the GRANT command fails after its corresponding REVOKE command? Should we roll back the REVOKE command? Or should we retry the GRANT command? The policy for a table could also involve multiple principals. What should we do when the operation corresponding to a grantee principal fails? On the other hand, there does not seem to be a Ranger API that allows us to retrieve the exact policy matching a given table name. There is a Ranger API that could return an access control list (ACL) given the name of a resource, e.g., the table "functional.alltypes". A place where we call this is within RangerImpaladAuthorizationManager#getPrivileges() ([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]), which could be triggered by a statement like "SHOW GRANT USER non_owner ON TABLE functional.alltypes". For instance, given the table name "functional.alltypes", we could get a HashMap called "userACLs", and the contents of this map could look like the following. Note that in the following, only the first map corresponds to the policy in which the resource is exactly the table "functional.alltypes". This policy was created by an administrative user via "GRANT SELECT ON TABLE functional.alltypes to USER non_owner". The rest of the maps were inferred by other policies. Take the 2nd map, the user "hdfs" has the privileges on the table "functional.alltypes" through the policy that grants "hdfs" the ALL privilege on all the databases, tables, and columns. # "non_owner" -> {"select" -> "ALLOWED"} # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...} # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} was (Author: fangyurao): This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose
[jira] [Commented] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848439#comment-17848439 ] Fang-Yu Rao commented on IMPALA-12190: -- This JIRA does not seem to be straightforward to resolve on the Impala side alone because the error handling could be tricky. I think we may need Apache Ranger to provide an API that could take care of this for us (Apache Impala). Specifically, it would be great if there is a Ranger API that is able to modify the policies accordingly when the catalog server alters the name of a table. For instance, when the catalog server is executing ALTER TABLE RENAME, the catalog server also sends to the Ranger server via Impala's Ranger plug-in a request to change the name of the table in Ranger's policy repository if there is a policy matching this table. Ranger stores its policies in its backend database, so it would be much easier for Ranger to manage this operation, especially when there is an error/exception that occurs during the execution of the operation. If we'd like to resolve this from Apache Impala alone, then we have to be able to do the following properly. # Retrieve the policy matching the name of the table whose name is going to be altered. # For each grantee principal in the policy (which could be a user, group, or a role) in the policy retrieved above, issue a REVOKE statement to revoke this grantee's privileges on the old table (the table before the renaming) and then issue a GRANT statement to grant those previously revoked privileges to this grantee on the new table (the table with the new name). A grantee could have multiple privileges on the table so multiple REVOKE/GRANT could be required. It seems a bit tricky to handle the errors that occur during the 2nd step described above. For instance, assume that a grantee only has only one privilege granted on the old table, what should the catalog server do when the GRANT command fails after its corresponding REVOKE command? Should we roll back the REVOKE command? Or should we retry the GRANT command? The policy for a table could also involve multiple principals. What should we do when the operation corresponding to a grantee principal fails? On the other hand, there does not seem to be a Ranger API that allows us to retrieve the exact policy matching a given table name. There is a Ranger API that could return an access control list (ACL) given the name of a resource, e.g., the table "functional.alltypes". A place where we call this is within RangerImpaladAuthorizationManager#getPrivileges() ([plugin_.get().getResourceACLs(request)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L367]), which could be triggered by a statement like "SHOW GRANT USER non_owner ON TABLE functional.alltypes". For instance, given the table name "functional.alltypes", we could get a HashMap called "userACLs", and the contents of this map could look like the following. Note that in the following, only the first map corresponds to the policy in which the resource is exactly the table "functional.alltypes". This policy was created by an administrative user via "GRANT SELECT ON TABLE functional.alltypes to USER non_owner". The rest of the maps were inferred by other policies. Take the 2nd map, the user "hdfs" has the privileges on the table "functional.alltypes" through the policy that grants "hdfs" the ALL privilege on all the databases, tables, and columns. # "non_owner" -> \{"select"-> "ALLOWED"} # "hdfs" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} # "admin" -> \{"drop" -> "ALLOWED", "all" -> "ALLOWED", ...} # "\{OWNER}" -> \{"all" -> "ALLOWED", "drop" -> "ALLOWED", ...} > Renaming table will cause losing privileges for non-admin users > --- > > Key: IMPALA-12190 > URL: https://issues.apache.org/jira/browse/IMPALA-12190 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Gabor Kaszab >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: alter-table, authorization, ranger > > Let's say user 'a' gets some privileges on table 't'. When this table gets > renamed (even by user 'a') then user 'a' loses its privileges on that table. > > Repro steps: > # Start impala with Ranger > # start impala-shell as admin (-u admin) > # create table tmp (i int, s string) stored as parquet; > # grant all on table tmp to user ; > # grant all on table tmp to user ; > {code:java} > Query: show grant user on table tmp > +++--+---++-+--+-+-+---+--+-+ > | principal_type | principal_n
[jira] [Resolved] (IMPALA-11622) Impala load data command fails when the impala user has access on source file through Ranger policy
[ https://issues.apache.org/jira/browse/IMPALA-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11622. -- Resolution: Duplicate This is a duplicate of IMPALA-10272, which has already been resolved. > Impala load data command fails when the impala user has access on source file > through Ranger policy > --- > > Key: IMPALA-11622 > URL: https://issues.apache.org/jira/browse/IMPALA-11622 > Project: IMPALA > Issue Type: Bug >Reporter: Abhishek >Priority: Major > > When trying to run the load data command in Impala, > if the Impala user has access on the source file through a Ranger HDFS policy, > then the load data command fails. > If the impala user has access on the source file through HDFS ACLs, > then the load data command executes successfully. > Steps to reproduce :- > Ranger policy setup > HDFS policies > Policy 1 :- > All access policy for HDFS user > user - hdfs > resources - * , recursive=true > access - all access allowed > Policy 2 :- > Access for impala user on /root_test_dir/test_dir_2 > user - impala > resources - /root_test_dir/test_dir_2 , recursive = true > access - all access allowed > Hadoop SQL policies > Policy 1 : All access policy for hrt_qa, hive and impala user > users - hrt_qa, impala, hive > resources - db - *, table - *, column - * > access - all access allowed > Policy 2 : Url policy for hrt_qa user > users - hrt_qa > resources :- url - * > access - all access allowed > Data setup :- > In HDFS, > create the following directories as the hdfs user > {code:java|bgColor=#f4f5f7} > /root_test_dir > /root_test_dir/test_dir_1 > /root_test_dir/test_dir_2{code} > Create a text file in local machine temp.txt with the any content ( for ex :- > Hello World) > Then copy the temp.txt file to the HDFS dirs /root_test_dir/test_dir_1 and > /root_test_dir/test_dir_2 > Set the ACLs for /root_test_dir/test_dir_1 to 777 recursively > {code:java|bgColor=#f4f5f7} > hdfs dfs -chmod -R 777 /root_test_dir/test_dir_1 {code} > > Set the ACLs for /root_test_dir/test_dir_2 to 000 recursively > {code:java|bgColor=#f4f5f7} > hdfs dfs -chmod -R 000 /root_test_dir/test_dir_2{code} > (Run all the hdfs commands as the hdfs user) > In Impala-shell, as hrt_qa user > create a test_db and create a test_table under test_db. > {code:java|bgColor=#f4f5f7} > CREATE TABLE test_db.test_table(c0 string) STORED AS TEXTFILE > TBLPROPERTIES('transactional'='false'){code} > > Run the LOAD DATA command as hrt_qa user :- > {code:java|bgColor=#f4f5f7} > test_db> LOAD DATA INPATH '/root_test_dir/test_dir_1/temp.txt' INTO TABLE > test_db.test_table > > ; > Query: LOAD DATA INPATH '/root_test_dir/test_dir_1/temp.txt' INTO TABLE > test_db.test_table > +--+ > | summary | > +--+ > | Loaded 1 file(s). Total files in destination location: 1 | > +--+ > Fetched 1 row(s) in 6.56s {code} > Failing case :- > {code:java} > test_db> LOAD DATA INPATH '/root_test_dir/test_dir_2/temp.txt' INTO TABLE > test_db.test_table; Query: LOAD DATA INPATH > '/root_test_dir/test_dir_2/temp.txt' INTO TABLE test_db.test_table ERROR: > AccessControlException: Permission denied: user=impala, access=READ, > inode="/warehouse/tablespace/external/hive/test_db.db/test_table/.tmp_4b9b3a83-f4f9-4363-81ae-21f5c170c1bd/temp.txt":hdfs:supergroup:-- > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13009) Possible leak of partition updates when the table has failed DDL and recovered by INVALIDATE METADATA
[ https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838405#comment-17838405 ] Fang-Yu Rao commented on IMPALA-13009: -- Thanks for the detailed steps to reproduce the issue [~stigahuang]! I have tried your latest script at https://issues.apache.org/jira/browse/IMPALA-13009?focusedCommentId=17838211&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17838211 and found that I could also reproduce the issue after restarting only the Impala daemons (via "{*}bin/start-impala-cluster.py -r{*}") even though we don't have the command that removes the HDFS path from outside of Impala. I was using Apache Impala on a recent master where the tip commit is IMPALA-12996 (Add support for DATE in Iceberg metadata tables). {code:java} I0417 16:06:57.716398 16131 ImpaladCatalog.java:232] Adding: TABLE:default.my_part version: 1723 size: 1557 I0417 16:06:57.719789 16131 ImpaladCatalog.java:232] Adding: CATALOG_SERVICE_ID version: 1723 size: 60 I0417 16:06:57.720358 16131 ImpaladCatalog.java:257] Adding 9 partition(s): HDFS_PARTITION:default.my_part:(p=1,p=2,...,p=9), versions=[1706, 1712, 1718], size=(avg=588, min=588, max=588, sum=5292) E0417 16:06:57.917488 16131 ImpaladCatalog.java:264] Error adding catalog object: Received stale partition in a statestore update: THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, type:TColumnType(types:[TTypeNode(type:SCALAR, scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, stats:TTableStats(num_rows:-1), is_marked_cached:false, hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, partition_name:p=1, hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, blockSize:0)) Java exception follows: java.lang.IllegalStateException: Received stale partition in a statestore update: THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, type:TColumnType(types:[TTypeNode(type:SCALAR, scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, int_literal:TIntLiteral(value:1), is_codegen_disabled:false)])], location:THdfsPartitionLocation(prefix_index:0, suffix:p=1), id:0, file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 A9 E7 4F EE 8E 01 00 00 02 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 37 00 00 00 61 61 34 36 34 66 61 66 35 61 31 37 36 65 39 65 2D 36 63 66 31 63 38 34 61 30 30 30 30 30 30 30 30 5F 31 37 31 31 36 38 30 30 38 32 5F 64 61 74 61 2E 30 2E 74 78 74 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, stats:TTableStats(num_rows:-1), is_marked_cached:false, hms_parameters:{transient_lastDdlTime=1713395198, totalSize=2, numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:2, has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, partition_name:p=1, hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, blockSize:0)) at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:565) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) {code} > Possible leak of partition updates when the table has failed DDL and > recovered by INVALIDATE METADATA > - > > Key: IMPALA-13009 >
[jira] [Created] (IMPALA-12994) Revise the implementation of FsPermissionChecker to take Ranger policies into consideration
Fang-Yu Rao created IMPALA-12994: Summary: Revise the implementation of FsPermissionChecker to take Ranger policies into consideration Key: IMPALA-12994 URL: https://issues.apache.org/jira/browse/IMPALA-12994 Project: IMPALA Issue Type: Task Components: Frontend Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao Impala's current implementation of [FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java] does not take into consideration the Ranger policies on HDFS or the underlying file system, which could result in unwanted AnalysisException during query analysis phase as reported in IMPALA-11871 and IMPALA-12291. We should consider revising FsPermissionChecker to consider the Ranger policies on the storage layer as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl
[ https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12985: - Description: After RANGER-2763, we changed the signature of the class RangerAccessRequestImpl in by adding an additional input argument 'userRoles' as shown in the following. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) { ... {code} The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 or to be able to build Apache Impala with locally built Apache Ranger, it may be faster to switch to the new signature on the Impala side than waiting for RANGER-4770 to be resolved on the Ranger side. was: After RANGER-2763, we changed the signature of the class RangerAccessRequestImpl in by adding an additional input argument 'userRoles' as shown in the following. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) { ... {code} The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 or to be able to build Apache Impala with Apache Ranger, it may be faster to switch to the new signature on the Impala side. > Use the new constructor when instantiating RangerAccessRequestImpl > -- > > Key: IMPALA-12985 > URL: https://issues.apache.org/jira/browse/IMPALA-12985 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > After RANGER-2763, we changed the signature of the class > RangerAccessRequestImpl in by adding an additional input argument 'userRoles' > as shown in the following. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) { > ... > {code} > The new signature is also provided in CDP Ranger. Thus to unblock > IMPALA-12921 or to be able to build Apache Impala with locally built Apache > Ranger, it may be faster to switch to the new signature on the Impala side > than waiting for RANGER-4770 to be resolved on the Ranger side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl
Fang-Yu Rao created IMPALA-12985: Summary: Use the new constructor when instantiating RangerAccessRequestImpl Key: IMPALA-12985 URL: https://issues.apache.org/jira/browse/IMPALA-12985 Project: IMPALA Issue Type: Task Components: Frontend Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao After RANGER-2763, we changed the signature of the class RangerAccessRequestImpl in by adding an additional input argument 'userRoles' as shown in the following. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) { ... {code} The new signature is also provided in CDP Ranger. Thus to unblock IMPALA-12921 or to be able to build Apache Impala with Apache Ranger, it may be faster to switch to the new signature on the Impala side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12921) Consider adding support for locally built Ranger
[ https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12921: - Description: It would be nice to be able to support locally built Ranger in Impala's minicluster in that it would facilitate the testing of features that require changes to both components. *+Edit:+* Making the current Apache Impala on *master* (tip is {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support Ranger on *master* (tip is {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS plugin) may be too ambitious. The signatures of some classes are already incompatible. For instance, on the Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* via the following code. 4 input arguments are needed. {code:java} RangerAccessRequest req = new RangerAccessRequestImpl(resource, SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user)); {code} However, the current signature of RangerAccessRequestImpl's constructor on the master of Apache Ranger is the following. It can be seen we need 5 input arguments instead. {code:java} public RangerAccessRequestImpl(RangerAccessResource resource, String accessType, String user, Set userGroups, Set userRoles) {code} It may be more practical to support Ranger on an earlier version, e.g., [https://github.com/apache/ranger/blob/release-ranger-2.4.0]. was:It would be nice to be able to support locally built Ranger in Impala's minicluster in that it would facilitate the testing of features that require changes to both components. > Consider adding support for locally built Ranger > > > Key: IMPALA-12921 > URL: https://issues.apache.org/jira/browse/IMPALA-12921 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > It would be nice to be able to support locally built Ranger in Impala's > minicluster in that it would facilitate the testing of features that require > changes to both components. > *+Edit:+* > Making the current Apache Impala on *master* (tip is > {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support > Ranger on *master* (tip is > {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS > plugin) may be too ambitious. > The signatures of some classes are already incompatible. For instance, on the > Impala side, Impala instantiates the instance of *RangerAccessRequestImpl* > via the following code. 4 input arguments are needed. > {code:java} > RangerAccessRequest req = new RangerAccessRequestImpl(resource, > SELECT_ACCESS_TYPE, user.getShortName(), getUserGroups(user)); > {code} > However, the current signature of RangerAccessRequestImpl's constructor on > the master of Apache Ranger is the following. It can be seen we need 5 input > arguments instead. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) > {code} > It may be more practical to support Ranger on an earlier version, e.g., > [https://github.com/apache/ranger/blob/release-ranger-2.4.0]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it
[ https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12291. -- Resolution: Duplicate This seems to be a duplicate of IMPALA-11871. We could probably continue our discussion there. I will also review the patch at https://gerrit.cloudera.org/c/20221/ and see how we could proceed. cc: [~khr9603], [~stigahuang], [~amansinha] > Insert statement fails even if hdfs ranger policy allows it > --- > > Key: IMPALA-12291 > URL: https://issues.apache.org/jira/browse/IMPALA-12291 > Project: IMPALA > Issue Type: Bug > Components: fe, Security > Environment: - Impala Version (4.1.0) > - Ranger admin version (2.0) > - Hive version (3.1.2) >Reporter: halim kim >Assignee: halim kim >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Apache Ranger is framework for providing security and authorization in hadoop > platform. > Impala can also utilize apache ranger via ranger hive policy. > The thing is that insert or some other query is not executed even If you > enable ranger hdfs plugin and set proper allow condition for impala query > excuting. > you can see error log like below. > {code:java} > AnalysisException: Unable to INSERT into target table (testdb.testtable) > because Impala does not have WRITE access to HDFS location: > hdfs://testcluster/warehouse/testdb.db/testtable > {code} > This happens when ranger hdfs plugin is enabled but impala doesn't have > permission for hdfs POSIX permission. > For example, In the case that DB file owner, group and permission is set as > hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows > impala to execute query, Insert Query will be fail. > In my opinion, The main cause is impala fe component doesn't check ranger > policy but hdfs POSIX model permissions. > Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272 > I'm working on resolving this issue by adding hdfs ranger policy checking > code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832957#comment-17832957 ] Fang-Yu Rao commented on IMPALA-11871: -- After reading some past JIRA's in this area, I think it should be safe to skip {*}analyzeWriteAccess{*}() for the *INSERT* statement (or add a startup flag to disable it). Before the fix is ready, we could add the following to the *core-site.xml* consumed by the catalog server to allow an authorized user (by Ranger via Impala's frontend) to insert values into an HDFS table in the {*}legacy catalog mode{*}. Recall that the catalog server would consider the service user, usually named '{*}impala{*}', as a super user as long as the user '{*}impala{*}' belongs to the specified super group by ''. {code:java} dfs.permissions.superusergroup true {code} This is still secure when Ranger is the authorization provider because of the following. # For the INSERT statement, Impala's frontend makes sure the logged-in user (not necessarily the service user '{*}impala{*}') is granted the necessary privilege on the target table. The respective audit log entry is also produced whether or not the query is authorized even though we skip {*}analyzeWriteAccess{*}(). # For a query that has been authorized by Impala's frontend and sent to the backend for execution, if Impala's backend interacts with the underlying services, e.g., HDFS, as the service user '{*}impala{*}', then this service user should always be considered as a super user or a user in a super group. +*Detailed Analysis*+ We started performing such a permissions checking in [IMPALA-1279: Check ACLs for INSERT and LOAD statements|https://github.com/cloudera/Impala/commit/0b32bbd899d988f1cd5c526597932b67f4c35cce] when we were using Sentry as authorization provider. The reason to implement IMPALA-1279 was also mentioned in the description of the JIRA and is excerpted below for easy reference. In short, we would like to fail a query as early as possible if there could be permissions-related issue. {quote}Impala checks permissions for LOAD and INSERT statements before executing them to allow for early-exit if the query would not succeed. However, it does not take extended ACLs in CDH5 into account. When a directory has restrictive Posix permissions (e.g. 000), but has an ACL allowing writes, Impala should allow INSERTs and LOADs to happen to that directory. Instead, the early check will disallow them. If the checks were disabled, the queries would execute (or not!) correctly, because we delegate to libhdfs or the DistributedFileSystem API to actually perform the operations we need. {quote} We hand-crafted the permissions checker within Impala. Specifically, in our [implementation|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java#L206-L222], Hadoop ACL entries takes precedence over the POSIX permissions and we did *not* take into consideration the policies that could be defined on the HDFS path when the authorization provider is Ranger. Due to how we implemented [FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java], it's possible that even though a logged-in user has been authorized to execute an INSERT statement into a table via a policy added to Ranger's repository of SQL, the query could fail during the analysis, simply because the service user, usually named '{*}impala{*}', could not pass the permissions checker. For instance, this could occur if the table to insert was created by another query engine, e.g., Hive Server2 (HS2) and thus the table is owned by another service user, e.g., '{*}hive{*}'. In addition, we have an ACL entry of "{*}group::r-x{*}" by default when the table was created. The current implementation of Impala's permissions checker would deny the service user '{*}impala{*}' of writing the table even the user '{*}impala{*}' is in the group of '{*}hive{*}' as shown in the following. {code:java} [r...@ccycloud-4.engesc24485d02.root.comops.site ~]# hdfs dfs -getfacl # file: # owner: hive # group: hive user::rwx group::r-x other::r-x [r...@ccycloud-4.engesc24485d02.root.comops.site impalad]# groups impala impala : impala hive {code} In [IMPALA-3143|https://github.com/apache/impala/commit/a0ad1868bda902fd914bc2be39eb9629a6eceb76], we allowed an administrator to specify the name of the super group (from catalog server's perspective). Once the *current user* belongs to the specified super group denoted via '{*}DFS_PERMISSIONS_SUPERUSERGROUP_KEY{*}' ("{*}dfs.permissions.superusergroup{*}"), which defaulted to '{*}DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT{*}' ("{*}supergroup{*}"), then catalog server would grant the WRITE request against the corresponding table from the current user. Refer t
[jira] [Comment Edited] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830738#comment-17830738 ] Fang-Yu Rao edited comment on IMPALA-11871 at 3/26/24 5:17 AM: --- Hi [~MikaelSmith], my current understanding is that this is not a regression from earlier releases. It's more like a feature request for usability. The method that is performing the permissions checking ({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, was to make sure the Impala service has the necessary write permissions as early as possible, i.e., during the query analysis phase (v.s. in the query execution phase). After Impala started supporting Ranger as its authorization provider, ideally, a cluster administrator should be able to manage the permissions on HDFS via either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally performs the permissions-checking without checking Ranger's policy repository of HDFS. IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could resolve this JIRA using the same approach there, where Impala's frontend calls *hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual access permissions, which could also reflect the permissions managed via Ranger's HDFS policy repository. was (Author: fangyurao): Hi [~MikaelSmith], my current understanding is that this is not a regression from earlier releases. It's more like a feature request for usability. The method that is performing the permissions checking ({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, was to make sure the Impala service has the necessary write permissions as early as possible, i.e., during the query analysis phase (v.s. in the query execution phase). After Impala started supporting Ranger as its authorization provider, ideally, a cluster administrator should be able to manage the permissions on HDFS via either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally performs the permissions-checking without checking Ranger's policy repository of HDFS. IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could resolve this JIRA using the same approach there, where Impala's frontend calls *hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual access permissions, which could also reflect the permissions manged via Ranger's HDFS policy repository. > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at > org.apache.impala.service.JniFrontend.createEx
[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830738#comment-17830738 ] Fang-Yu Rao commented on IMPALA-11871: -- Hi [~MikaelSmith], my current understanding is that this is not a regression from earlier releases. It's more like a feature request for usability. The method that is performing the permissions checking ({*}analyzeWriteAccess{*}()) was added in IMPALA-7311. The purpose, I guess, was to make sure the Impala service has the necessary write permissions as early as possible, i.e., during the query analysis phase (v.s. in the query execution phase). After Impala started supporting Ranger as its authorization provider, ideally, a cluster administrator should be able to manage the permissions on HDFS via either a) Ranger's policy repository for HDFS, or b) the HDFS Access Control Lists (HDFS ACLs). But at the moment, Impala's coordinator unconditionally performs the permissions-checking without checking Ranger's policy repository of HDFS. IMPALA-10272 resolved a similar issue for the LOAD DATA statement. We could resolve this JIRA using the same approach there, where Impala's frontend calls *hadoop.fs.FileSystem.access(Path path, FsAction mode)* to check the actual access permissions, which could also reflect the permissions manged via Ranger's HDFS policy repository. > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} > The exception occurs at analysis time, so I tested and succeeded in writing > directly into the said directory. > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz > /warehouse/tablespace/external/hive/t1/test > [root@nightly-71x-vx-3 ~]# hdfs dfs -ls > /warehouse/tablespace/external/hive/t1/ > Found 8 items > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 > /warehouse/tablespace/external/hive/t1/00_0 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 > /warehouse/tablespace/external/hive/t1/00_0_copy_1 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 > /warehouse/tablespace/external/hive/t1/00_0_copy_2 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 > /warehouse/tablespace/external/hive/t1/00_0_copy_3 > rw-rw---+ 3 impala hive 355 2023-01-27 17:17 > /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq > rw-rw---+ 3 impala hive 355 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c1478_574471191_data.0.parq > drwxrwx---+ - impala hive 0 2023-01-27 17:39 > /warehouse/tablespace/external/hive/t1/_impala_insert_staging > rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 > /warehouse/tablespace/ex
[jira] [Created] (IMPALA-12921) Consider adding support for locally built Ranger
Fang-Yu Rao created IMPALA-12921: Summary: Consider adding support for locally built Ranger Key: IMPALA-12921 URL: https://issues.apache.org/jira/browse/IMPALA-12921 Project: IMPALA Issue Type: Task Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao It would be nice to be able to support locally built Ranger in Impala's minicluster in that it would facilitate the testing of features that require changes to both components. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819426#comment-17819426 ] Fang-Yu Rao edited comment on IMPALA-12830 at 2/22/24 12:43 AM: This issue seems to be similar to IMPALA-12170. cc: [~stigahuang] was (Author: fangyurao): This issue seems to be similar to IMPALA-12170. > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code:java} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819426#comment-17819426 ] Fang-Yu Rao commented on IMPALA-12830: -- This issue seems to be similar to IMPALA-12170. > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code:java} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819425#comment-17819425 ] Fang-Yu Rao commented on IMPALA-12830: -- Hi [~skatiyal], assigned the JIRA to you since you revised the test case in IMPALA-9086 (Show Hive configurations in /hadoop-varz page) and thus may be more familiar with the context. Please feel free to re-assign as you see appropriate. Thanks! > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code:java} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12830) test_web_pages() could fail in the exhaustive build
Fang-Yu Rao created IMPALA-12830: Summary: test_web_pages() could fail in the exhaustive build Key: IMPALA-12830 URL: https://issues.apache.org/jira/browse/IMPALA-12830 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Saurabh Katiyal We found in an internal Jenkins run that test_web_pages() could fail in the exhaustive build with the following error. +*Error Message*+ {code} AssertionError: bad links from webui port 25020 assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', - u'/catalog', ? - + '/catalog', - u'/events', - u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', ? - + '/jmx', - u'/log_level', ? - + '/log_level', - u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - u'/operations', ? - + '/operations', - u'/profile_docs', ? - + '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - + '/threadz', - u'/varz'] ? - + '/varz'] {code} +*Stacktrace*+ {code} custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link assert found_links == expected_catalog_links, msg E AssertionError: bad links from webui port 25020 E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] E At index 2 diff: u'/events' != '/hadoop-varz' E Full diff: E - [u'/', E ? - E + ['/', E - u'/catalog', E ? - E + '/catalog', E - u'/events', E - u'/hadoop-varz', E ? - E + '/hadoop-varz', E + '/events', E - u'/jmx', E ? - E + '/jmx', E - u'/log_level', E ? - E + '/log_level', E - u'/memz', E ? - E + '/memz', E - u'/metrics', E ? - E + '/metrics', E - u'/operations', E ? - E + '/operations', E - u'/profile_docs', E ? - E + '/profile_docs', E - u'/rpcz', E ? - E + '/rpcz', E - u'/threadz', E ? - E + '/threadz', E - u'/varz'] E ? - E + '/varz'] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12830: - Summary: test_webserver_hide_logs_link() could fail in the exhaustive build (was: test_web_pages() could fail in the exhaustive build) > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_web_pages() could fail in the > exhaustive build with the following error. > +*Error Message*+ > {code} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', > ? - + '/jmx', - u'/log_level', ? - + '/log_level', - > u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - > u'/operations', ? - + '/operations', - u'/profile_docs', ? - + > '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - > + '/threadz', - u'/varz'] ? - + '/varz'] > {code} > +*Stacktrace*+ > {code} > custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link > assert found_links == expected_catalog_links, msg > E AssertionError: bad links from webui port 25020 > E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] > E At index 2 diff: u'/events' != '/hadoop-varz' > E Full diff: > E - [u'/', > E ? - > E + ['/', > E - u'/catalog', > E ? - > E + '/catalog', > E - u'/events', > E - u'/hadoop-varz', > E ? - > E + '/hadoop-varz', > E + '/events', > E - u'/jmx', > E ? - > E + '/jmx', > E - u'/log_level', > E ? - > E + '/log_level', > E - u'/memz', > E ? - > E + '/memz', > E - u'/metrics', > E ? - > E + '/metrics', > E - u'/operations', > E ? - > E + '/operations', > E - u'/profile_docs', > E ? - > E + '/profile_docs', > E - u'/rpcz', > E ? - > E + '/rpcz', > E - u'/threadz', > E ? - > E + '/threadz', > E - u'/varz'] > E ? - > E + '/varz'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12830) test_webserver_hide_logs_link() could fail in the exhaustive build
[ https://issues.apache.org/jira/browse/IMPALA-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12830: - Description: We found in an internal Jenkins run that test_webserver_hide_logs_link() could fail in the exhaustive build with the following error. +*Error Message*+ {code:java} AssertionError: bad links from webui port 25020 assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', - u'/catalog', ? - + '/catalog', - u'/events', - u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', ? - + '/jmx', - u'/log_level', ? - + '/log_level', - u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - u'/operations', ? - + '/operations', - u'/profile_docs', ? - + '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - + '/threadz', - u'/varz'] ? - + '/varz'] {code} +*Stacktrace*+ {code:java} custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link assert found_links == expected_catalog_links, msg E AssertionError: bad links from webui port 25020 E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] E At index 2 diff: u'/events' != '/hadoop-varz' E Full diff: E - [u'/', E ? - E + ['/', E - u'/catalog', E ? - E + '/catalog', E - u'/events', E - u'/hadoop-varz', E ? - E + '/hadoop-varz', E + '/events', E - u'/jmx', E ? - E + '/jmx', E - u'/log_level', E ? - E + '/log_level', E - u'/memz', E ? - E + '/memz', E - u'/metrics', E ? - E + '/metrics', E - u'/operations', E ? - E + '/operations', E - u'/profile_docs', E ? - E + '/profile_docs', E - u'/rpcz', E ? - E + '/rpcz', E - u'/threadz', E ? - E + '/threadz', E - u'/varz'] E ? - E + '/varz'] {code} was: We found in an internal Jenkins run that test_web_pages() could fail in the exhaustive build with the following error. +*Error Message*+ {code} AssertionError: bad links from webui port 25020 assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', - u'/catalog', ? - + '/catalog', - u'/events', - u'/hadoop-varz', ? - + '/hadoop-varz', + '/events', - u'/jmx', ? - + '/jmx', - u'/log_level', ? - + '/log_level', - u'/memz', ? - + '/memz', - u'/metrics', ? - + '/metrics', - u'/operations', ? - + '/operations', - u'/profile_docs', ? - + '/profile_docs', - u'/rpcz', ? - + '/rpcz', - u'/threadz', ? - + '/threadz', - u'/varz'] ? - + '/varz'] {code} +*Stacktrace*+ {code} custom_cluster/test_web_pages.py:248: in test_webserver_hide_logs_link assert found_links == expected_catalog_links, msg E AssertionError: bad links from webui port 25020 E assert ['/', '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] E At index 2 diff: u'/events' != '/hadoop-varz' E Full diff: E - [u'/', E ? - E + ['/', E - u'/catalog', E ? - E + '/catalog', E - u'/events', E - u'/hadoop-varz', E ? - E + '/hadoop-varz', E + '/events', E - u'/jmx', E ? - E + '/jmx', E - u'/log_level', E ? - E + '/log_level', E - u'/memz', E ? - E + '/memz', E - u'/metrics', E ? - E + '/metrics', E - u'/operations', E ? - E + '/operations', E - u'/profile_docs', E ? - E + '/profile_docs', E - u'/rpcz', E ? - E + '/rpcz', E - u'/threadz', E ? - E + '/threadz', E - u'/varz'] E ? - E + '/varz'] {code} > test_webserver_hide_logs_link() could fail in the exhaustive build > -- > > Key: IMPALA-12830 > URL: https://issues.apache.org/jira/browse/IMPALA-12830 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Saurabh Katiyal >Priority: Major > Labels: broken-build > > We found in an internal Jenkins run that test_webserver_hide_logs_link() > could fail in the exhaustive build with the following error. > +*Error Message*+ > {code:java} > AssertionError: bad links from webui port 25020 assert ['/', > '/catal...g_level', ...] == ['/', '/catalo...g_level', ...] At index 2 > diff: u'/events' != '/hadoop-varz' Full diff: - [u'/', ? - + ['/', > - u'/catalog', ? - + '/catalog', - u'/events', - > u'/hadoop-varz', ? - + '/hadoop-varz'
[jira] [Commented] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest
[ https://issues.apache.org/jira/browse/IMPALA-12819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818215#comment-17818215 ] Fang-Yu Rao commented on IMPALA-12819: -- Hi [~MikaelSmith], assigned the JIRA to you since you helped with IMPALA-11260 earlier and may be more familiar with the context. Please re-assign the ticket as you see appropriate. Thanks! > InaccessibleObjectException found during LocalCatalogTest > - > > Key: IMPALA-12819 > URL: https://issues.apache.org/jira/browse/IMPALA-12819 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.4.0 >Reporter: Fang-Yu Rao >Assignee: Michael Smith >Priority: Major > Labels: broken-build > > We found in an internal build that during LocalCatalogTest we could encounter > InaccessibleObjectException. This was found by the test > [test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35] > {code:java} > W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing > Ehcache from accessing the subgraph beneath 'private final > jdk.internal.platform.CgroupV1Metrics > jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be > underestimated as a result > Java exception follows: > java.lang.reflect.InaccessibleObjectException: Unable to make field private > final jdk.internal.platform.CgroupV1Metrics > jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module > java.base does not "opens jdk.internal.platform" to unnamed module @2c89cd7f > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) > at > java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) > at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) > at > org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) > at > org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) > at > org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) > at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234) > at > com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043) > at > com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990) > at com.google.common.cache.LocalCache.replace(LocalCache.java:4324) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160) > at > org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96) > at > org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131) > at > org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114) > at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148) > at > org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139) > at > org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) >
[jira] [Created] (IMPALA-12819) InaccessibleObjectException found during LocalCatalogTest
Fang-Yu Rao created IMPALA-12819: Summary: InaccessibleObjectException found during LocalCatalogTest Key: IMPALA-12819 URL: https://issues.apache.org/jira/browse/IMPALA-12819 Project: IMPALA Issue Type: Bug Components: fe Affects Versions: Impala 4.4.0 Reporter: Fang-Yu Rao Assignee: Michael Smith We found in an internal build that during LocalCatalogTest we could encounter InaccessibleObjectException. This was found by the test [test_no_inaccessible_objects|https://github.com/apache/impala/blob/master/tests/verifiers/test_banned_log_messages.py#L40C7-L40C35] {code:java} W0217 01:31:14.108255 18119 ObjectGraphWalker.java:251] The JVM is preventing Ehcache from accessing the subgraph beneath 'private final jdk.internal.platform.CgroupV1Metrics jdk.internal.platform.CgroupV1MetricsImpl.metrics' - cache sizes may be underestimated as a result Java exception follows: java.lang.reflect.InaccessibleObjectException: Unable to make field private final jdk.internal.platform.CgroupV1Metrics jdk.internal.platform.CgroupV1MetricsImpl.metrics accessible: module java.base does not "opens jdk.internal.platform" to unnamed module @2c89cd7f at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) at org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) at org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) at org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) at org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:2234) at com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2043) at com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2990) at com.google.common.cache.LocalCache.replace(LocalCache.java:4324) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:569) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1160) at org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:96) at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:131) at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114) at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148) at org.apache.impala.catalog.local.LocalCatalog.getTable(LocalCatalog.java:139) at org.apache.impala.catalog.local.LocalCatalogTest.testLoadIcebergFileDescriptors(LocalCatalogTest.java:280) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:316) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:240) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4
[jira] [Updated] (IMPALA-11743) Support the OWNER privilege for UDFs in Impala
[ https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-11743: - Summary: Support the OWNER privilege for UDFs in Impala (was: Investigate how to support the OWNER privilege for UDFs in Impala) > Support the OWNER privilege for UDFs in Impala > -- > > Key: IMPALA-11743 > URL: https://issues.apache.org/jira/browse/IMPALA-11743 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently in Impala a user allowed to create a UDF in a database still has to > be explicitly granted the necessary privileges to execute the UDF later in a > SELECT query. It would be more convenient if the ownership information of a > UDF could also be retrieved during the query analysis of such SELECT queries > so that the owner/creator of a UDF will be allowed to execute the UDF without > being explicitly granted the necessary privileges on the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns
[ https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803733#comment-17803733 ] Fang-Yu Rao commented on IMPALA-12578: -- I separate the case of UDFs from this JIRA because currently Impala does not have the concept of owner with respect to UDF. According to what is seen in IMPALA-11743, the changes needed to support UDF ownership will be complicated and thus it's better to have a separate JIRA for the case of UDFs. > Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for > databases, tables, and columns > --- > > Key: IMPALA-12578 > URL: https://issues.apache.org/jira/browse/IMPALA-12578 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Starting from RANGER-1200, Ranger supports the notion of the OWNER user, > which allows each user to perform any operation on the resources owned by it. > This avoids the need for creating a new policy that grants the OWNER user the > privileges on every newly created resource. Refer to > [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. > Currently for the GRANT and REVOKE statements, Impala does not pass the owner > of the resource to the Ranger plug-in and thus a non-administrative user > could not grant/revoke privileges on a resource to/from another user even > though this non-administrative user owns the resource. We should pass the > ownership information to the Ranger plug-in to make authorization management > easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs
[ https://issues.apache.org/jira/browse/IMPALA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12685: - Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs (was: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF) > Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDFs > - > > Key: IMPALA-12685 > URL: https://issues.apache.org/jira/browse/IMPALA-12685 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > This is the follow-up to IMPALA-12578, where we tackle the cases of > databases, tables, and columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12685) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF
Fang-Yu Rao created IMPALA-12685: Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for UDF Key: IMPALA-12685 URL: https://issues.apache.org/jira/browse/IMPALA-12685 Project: IMPALA Issue Type: New Feature Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao This is the follow-up to IMPALA-12578, where we tackle the cases of databases, tables, and columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12578) Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns
[ https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12578: - Summary: Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for databases, tables, and columns (was: Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements) > Pass the owner user to Ranger plug-in in GRANT and REVOKE statements for > databases, tables, and columns > --- > > Key: IMPALA-12578 > URL: https://issues.apache.org/jira/browse/IMPALA-12578 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Starting from RANGER-1200, Ranger supports the notion of the OWNER user, > which allows each user to perform any operation on the resources owned by it. > This avoids the need for creating a new policy that grants the OWNER user the > privileges on every newly created resource. Refer to > [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. > Currently for the GRANT and REVOKE statements, Impala does not pass the owner > of the resource to the Ranger plug-in and thus a non-administrative user > could not grant/revoke privileges on a resource to/from another user even > though this non-administrative user owns the resource. We should pass the > ownership information to the Ranger plug-in to make authorization management > easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala
[ https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803730#comment-17803730 ] Fang-Yu Rao edited comment on IMPALA-11743 at 1/6/24 12:16 AM: --- This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger plug-in the owner of a resource involved in a GRANT/REVOKE statement. Specifically, in the case when the resource is a user-defined function (UDF), Impala has to load this piece of information when instantiating user-defined functions in [CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836] so that the owner of a UDF will be available in Impala's internal representation of it, i.e., [Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java]. On a related note, in [hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift], Hive already has a field of 'ownerName' for a user-defined function. {code:java} struct Function { 1: string functionName, 2: string dbName, 3: string className, 4: string ownerName, 5: PrincipalTypeownerType, 6: i32 createTime, 7: FunctionType functionType, 8: list resourceUris, 9: optional string catName } {code} On the other hand, when an authorized user is creating a persistent UDF via Impala, Impala should also pass the requesting user as the owner of the UDF to Hive MetaStore. This way Impala will be able to load the owner of a UDF in CatalogServiceCatalog.java#loadJavaFunctions() mentioned above. was (Author: fangyurao): This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger plug-in the owner of a resource involved in a GRANT/REVOKE statement. Specifically, in the case when the resource is a user-defined function (UDF), Impala has to load this piece of information when instantiating user-defined functions in [CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836] so that the owner of a UDF will be available in Impala's internal representation of it, i.e., [Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java]. On a related note, in [hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift], Hive already has a field of 'ownerName' for a user-defined function. {code:java} struct Function { 1: string functionName, 2: string dbName, 3: string className, 4: string ownerName, 5: PrincipalTypeownerType, 6: i32 createTime, 7: FunctionType functionType, 8: list resourceUris, 9: optional string catName } {code} > Investigate how to support the OWNER privilege for UDFs in Impala > - > > Key: IMPALA-11743 > URL: https://issues.apache.org/jira/browse/IMPALA-11743 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently in Impala a user allowed to create a UDF in a database still has to > be explicitly granted the necessary privileges to execute the UDF later in a > SELECT query. It would be more convenient if the ownership information of a > UDF could also be retrieved during the query analysis of such SELECT queries > so that the owner/creator of a UDF will be allowed to execute the UDF without > being explicitly granted the necessary privileges on the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11743) Investigate how to support the OWNER privilege for UDFs in Impala
[ https://issues.apache.org/jira/browse/IMPALA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803730#comment-17803730 ] Fang-Yu Rao commented on IMPALA-11743: -- This JIRA is related to IMPALA-12578 where we would like to pass to the Ranger plug-in the owner of a resource involved in a GRANT/REVOKE statement. Specifically, in the case when the resource is a user-defined function (UDF), Impala has to load this piece of information when instantiating user-defined functions in [CatalogServiceCatalog.java#loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1812C16-L1836] so that the owner of a UDF will be available in Impala's internal representation of it, i.e., [Function.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Function.java]. On a related note, in [hive_metastore.thrift|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift], Hive already has a field of 'ownerName' for a user-defined function. {code:java} struct Function { 1: string functionName, 2: string dbName, 3: string className, 4: string ownerName, 5: PrincipalTypeownerType, 6: i32 createTime, 7: FunctionType functionType, 8: list resourceUris, 9: optional string catName } {code} > Investigate how to support the OWNER privilege for UDFs in Impala > - > > Key: IMPALA-11743 > URL: https://issues.apache.org/jira/browse/IMPALA-11743 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently in Impala a user allowed to create a UDF in a database still has to > be explicitly granted the necessary privileges to execute the UDF later in a > SELECT query. It would be more convenient if the ownership information of a > UDF could also be retrieved during the query analysis of such SELECT queries > so that the owner/creator of a UDF will be allowed to execute the UDF without > being explicitly granted the necessary privileges on the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reopened IMPALA-12554: -- > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12554. -- Resolution: Implemented > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12554. -- Resolution: Later After some manual testing, we found that RANGER-4585 has some bugs, e.g., REVOKE REST API call is not able to revoke the privilege on multiple columns from a grantee that was granted the SELECT privilege on the same set of columns. Before this is fixed, we resolve the ticket for now and will re-open the ticket once this issue is fixed in a follow-up RANGER JIRA. > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements
[ https://issues.apache.org/jira/browse/IMPALA-12578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12578: - Description: Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which allows each user to perform any operation on the resources owned by it. This avoids the need for creating a new policy that grants the OWNER user the privileges on every newly created resource. Refer to [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. Currently for the GRANT and REVOKE statements, Impala does not pass the owner of the resource to the Ranger plug-in and thus a non-administrative user could not grant/revoke privileges on a resource to/from another user even though this non-administrative user owns the resource. We should pass the ownership information to the Ranger plug-in to make authorization management easier in Impala. was: Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which allows each user to perform any operation on the resources owned by them. This avoids the need for creating a new policy that grants the OWNER user the privileges on every newly created resource. Refer to [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. Currently for the GRANT and REVOKE statements, Impala does not pass the owner of the resource to the Ranger plug-in and thus a non-administrative user could not grant/revoke privileges on a resource to/from another user even though this non-administrative user owns the resource. We should pass the ownership information to the Ranger plug-in to make authorization management easier in Impala. > Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements > > > Key: IMPALA-12578 > URL: https://issues.apache.org/jira/browse/IMPALA-12578 > Project: IMPALA > Issue Type: New Feature >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Starting from RANGER-1200, Ranger supports the notion of the OWNER user, > which allows each user to perform any operation on the resources owned by it. > This avoids the need for creating a new policy that grants the OWNER user the > privileges on every newly created resource. Refer to > [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. > Currently for the GRANT and REVOKE statements, Impala does not pass the owner > of the resource to the Ranger plug-in and thus a non-administrative user > could not grant/revoke privileges on a resource to/from another user even > though this non-administrative user owns the resource. We should pass the > ownership information to the Ranger plug-in to make authorization management > easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12578) Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements
Fang-Yu Rao created IMPALA-12578: Summary: Pass the owner user to the Ranger plug-in in GRANT and REVOKE statements Key: IMPALA-12578 URL: https://issues.apache.org/jira/browse/IMPALA-12578 Project: IMPALA Issue Type: New Feature Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao Starting from RANGER-1200, Ranger supports the notion of the OWNER user, which allows each user to perform any operation on the resources owned by them. This avoids the need for creating a new policy that grants the OWNER user the privileges on every newly created resource. Refer to [apache-ranger-policy-model|https://blogsarchive.apache.org/ranger/entry/apache-ranger-policy-model#:~:text=allow%20each%20user%20to%20access%20all,all]. Currently for the GRANT and REVOKE statements, Impala does not pass the owner of the resource to the Ranger plug-in and thus a non-administrative user could not grant/revoke privileges on a resource to/from another user even though this non-administrative user owns the resource. We should pass the ownership information to the Ranger plug-in to make authorization management easier in Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12554: - Description: Currently Impala would create a Ranger policy for each column specified in a GRANT statement. For instance, after the following query, 3 Ranger policies would be created on the Ranger server. This could result in a lot of policies created when there are many columns specified and it may result in Impala's Ranger plug-in taking a long time to download the policies from the Ranger server. It would be great if Impala only creates one single policy for columns in the same table. {code:java} [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner; Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) Query progress can be monitored at: http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 +-+ | summary | +-+ | Privilege(s) have been granted. | +-+ Fetched 1 row(s) in 0.67s {code} was: Currently Impala would create a Ranger policy for each column specified in a GRANT statement. For instance, after the following query, 3 Ranger policies would be created on the Ranger server. This could result in a lot of policies created when there are many columns specified and it may cause Impala's Ranger plug-in a long time to download the policies from the Ranger server. It would be great if Impala only creates one single policy for columns in the same table. {code} [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner; Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) Query progress can be monitored at: http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 +-+ | summary | +-+ | Privilege(s) have been granted. | +-+ Fetched 1 row(s) in 0.67s {code} > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-3268) Add command "SHOW VIEWS"
[ https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-3268: Description: Currently to get a list of views, user has to: - SHOW TABLES - scan through the output list - SHOW CREATE TABLE view_name to confirm view_name is a view which is tedious. So I would like to request the following: - -SHOW TABLES should only return tables- - SHOW VIEWS should only return views - -add a flag to either above commands to return all tables and views- This will help lots of end users. Edit: Moved the first item and the third item out of the scope of this JIRA to IMPALA-12574 since more discussion may be required. was: Currently to get a list of views, user has to: - SHOW TABLES - scan through the output list - SHOW CREATE TABLE view_name to confirm view_name is a view which is tedious. So I would like to request the following: - SHOW TABLES should only return tables - SHOW VIEWS should only return views - add a flag to either above commands to return all tables and views This will help lots of end users. > Add command "SHOW VIEWS" > > > Key: IMPALA-3268 > URL: https://issues.apache.org/jira/browse/IMPALA-3268 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0 >Reporter: Eric Lin >Assignee: Fang-Yu Rao >Priority: Minor > Labels: usability > > Currently to get a list of views, user has to: > - SHOW TABLES > - scan through the output list > - SHOW CREATE TABLE view_name to confirm view_name is a view > which is tedious. > So I would like to request the following: > - -SHOW TABLES should only return tables- > - SHOW VIEWS should only return views > - -add a flag to either above commands to return all tables and views- > This will help lots of end users. > Edit: Moved the first item and the third item out of the scope of this JIRA > to IMPALA-12574 since more discussion may be required. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display tables
[ https://issues.apache.org/jira/browse/IMPALA-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12574: - Summary: Consider extending SHOW TABLES statement so it only display tables (was: Consider extending SHOW TABLES statement so it only display the tables) > Consider extending SHOW TABLES statement so it only display tables > -- > > Key: IMPALA-12574 > URL: https://issues.apache.org/jira/browse/IMPALA-12574 > Project: IMPALA > Issue Type: New Feature > Components: Catalog, Frontend >Reporter: Fang-Yu Rao >Priority: Minor > > IMPALA-3268 extended Frontend's API of GetTableNames() such that > GetTableNames() could return the matching tables whose table type is in the > specified set of table types. With this change, it should not be too > difficult to extend the SHOW TABLES statement such that SHOW TABLES could > display only the tables of a specified type (v.s. all types of tables). It > would be great to have this functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12574) Consider extending SHOW TABLES statement so it only display the tables
Fang-Yu Rao created IMPALA-12574: Summary: Consider extending SHOW TABLES statement so it only display the tables Key: IMPALA-12574 URL: https://issues.apache.org/jira/browse/IMPALA-12574 Project: IMPALA Issue Type: New Feature Components: Catalog, Frontend Reporter: Fang-Yu Rao IMPALA-3268 extended Frontend's API of GetTableNames() such that GetTableNames() could return the matching tables whose table type is in the specified set of table types. With this change, it should not be too difficult to extend the SHOW TABLES statement such that SHOW TABLES could display only the tables of a specified type (v.s. all types of tables). It would be great to have this functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12554) Create only one Ranger policy for GRANT statement
Fang-Yu Rao created IMPALA-12554: Summary: Create only one Ranger policy for GRANT statement Key: IMPALA-12554 URL: https://issues.apache.org/jira/browse/IMPALA-12554 Project: IMPALA Issue Type: Improvement Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao Currently Impala would create a Ranger policy for each column specified in a GRANT statement. For instance, after the following query, 3 Ranger policies would be created on the Ranger server. This could result in a lot of policies created when there are many columns specified and it may cause Impala's Ranger plug-in a long time to download the policies from the Ranger server. It would be great if Impala only creates one single policy for columns in the same table. {code} [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner; Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes to user non_owner Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) Query progress can be monitored at: http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 +-+ | summary | +-+ | Privilege(s) have been granted. | +-+ Fetched 1 row(s) in 0.67s {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3268) Add command "SHOW VIEWS"
[ https://issues.apache.org/jira/browse/IMPALA-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-3268: --- Assignee: Fang-Yu Rao > Add command "SHOW VIEWS" > > > Key: IMPALA-3268 > URL: https://issues.apache.org/jira/browse/IMPALA-3268 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 2.2.4, Impala 2.3.0, Impala 2.5.0 >Reporter: Eric Lin >Assignee: Fang-Yu Rao >Priority: Minor > Labels: usability > > Currently to get a list of views, user has to: > - SHOW TABLES > - scan through the output list > - SHOW CREATE TABLE view_name to confirm view_name is a view > which is tedious. > So I would like to request the following: > - SHOW TABLES should only return tables > - SHOW VIEWS should only return views > - add a flag to either above commands to return all tables and views > This will help lots of end users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail
[ https://issues.apache.org/jira/browse/IMPALA-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780764#comment-17780764 ] Fang-Yu Rao commented on IMPALA-12528: -- Hi [~rizaon], assigned this JIRA to you since you are more familiar with the corresponding test. Please re-assign the ticket as you see appropriate. Thanks! > test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail > --- > > Key: IMPALA-12528 > URL: https://issues.apache.org/jira/browse/IMPALA-12528 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Riza Suminto >Priority: Major > Labels: broken-build, flaky-test > > [test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379] > could occassionally fail with the following error. > *+Stacktrace+* > {code:java} > E AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not > match expected results. > E EXPECTED VALUE: > E 3 > E > E > E ACTUAL VALUE: > E 1 > {code} > The corresponding test file > [hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test] > was recently added in IMPALA-12499. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12528) test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail
Fang-Yu Rao created IMPALA-12528: Summary: test_hdfs_scanner_thread_non_reserved_bytes could occasionally fail Key: IMPALA-12528 URL: https://issues.apache.org/jira/browse/IMPALA-12528 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Riza Suminto [test_hdfs_scanner_thread_non_reserved_bytes()|https://github.com/apache/impala/blob/master/tests/query_test/test_mem_usage_scaling.py#L379] could occassionally fail with the following error. *+Stacktrace+* {code:java} E AssertionError: Aggregation of SUM over NumScannerThreadsStarted did not match expected results. E EXPECTED VALUE: E 3 E E E ACTUAL VALUE: E 1 {code} The corresponding test file [hdfs-scanner-thread-non-reserved-bytes.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-non-reserved-bytes.test] was recently added in IMPALA-12499. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build
[ https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780556#comment-17780556 ] Fang-Yu Rao commented on IMPALA-12527: -- Hi [~tmate], assigned the JIRA to you since you recently revised the failed test in IMPALA-11996 so you are more familiar with this area. Please re-assign the ticket as you see appropriate. Thanks! > test_metadata_tables could occasionally fail in the s3 build > > > Key: IMPALA-12527 > URL: https://issues.apache.org/jira/browse/IMPALA-12527 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Tamas Mate >Priority: Major > Labels: broken-build, flaky-test > > We found that > [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] > that runs > [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] > could occasionally fail with the following error message. > It looks like the actual result does not match the expected result for some > columns. > Stacktrace > {code} > query_test/test_iceberg.py:1226: in test_metadata_tables > '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) > common/impala_test_suite.py:751: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:587: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:487: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:296: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E > row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 > != > 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 > E > row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 > != > 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 > E > row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 > != > 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 > E > row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL > != > 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL > {code} > Specifically, it seems the value of the second last column are different from > the expected value in some rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build
[ https://issues.apache.org/jira/browse/IMPALA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12527: - Description: We found that [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] that runs [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] could occasionally fail with the following error message. It looks like the actual result does not match the expected result for some columns. Stacktrace {code} query_test/test_iceberg.py:1226: in test_metadata_tables '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) common/impala_test_suite.py:751: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:587: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:487: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:296: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL != 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL {code} Specifically, it seems the value of the second last column are different from the expected value in some rows. was: We found that [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] that runs [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] could occasionally fail with the following error message. It looks like the actual result do not match the expected result for some columns. Stacktrace {code} query_test/test_iceberg.py:1226: in test_metadata_tables '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) common/impala_test_suite.py:751: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:587: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:487: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:296: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ic
[jira] [Created] (IMPALA-12527) test_metadata_tables could occasionally fail in the s3 build
Fang-Yu Rao created IMPALA-12527: Summary: test_metadata_tables could occasionally fail in the s3 build Key: IMPALA-12527 URL: https://issues.apache.org/jira/browse/IMPALA-12527 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Tamas Mate We found that [test_metadata_tables()|https://github.infra.cloudera.com/CDH/Impala/blame/cdw-master-staging/tests/query_test/test_iceberg.py#L1219] that runs [iceberg-metadata-tables.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test] could occasionally fail with the following error message. It looks like the actual result do not match the expected result for some columns. Stacktrace {code} query_test/test_iceberg.py:1226: in test_metadata_tables '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) common/impala_test_suite.py:751: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:587: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:487: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:296: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/7d479ffb82bfffd3-7ce667e5_544607964_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ab4ffd0d75a5a68d-13da0831_1541521750_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:0,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'',0 != 0,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/b04d1095845359f5-f0799bd0_1209897284_data.0.parq','PARQUET',0,1,351,'NULL',0 E row_regex:1,'s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq','PARQUET',0,1,[1-9]\d*|0,'NULL',NULL != 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/delete-1b45db885b2bdd56-4023218d0002_1697110314_data.0.parq','PARQUET',0,1,1531,'NULL',NULL {code} Specifically, it seems the value of the second last column are different from the expected value in some rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc
[ https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780524#comment-17780524 ] Fang-Yu Rao commented on IMPALA-12526: -- Hi [~stigahuang], assigned this JIRA to you since you are more familiar with the failed frontend test. Please re-assign the ticket as you see appropriate. Thanks! > BackendConfig.INSTANCE could be null in the frontend test > testResetMetadataDesc > --- > > Key: IMPALA-12526 > URL: https://issues.apache.org/jira/browse/IMPALA-12526 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build, flaky-test > > We found that > [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could be null in the frontend test > [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65] > and thus > [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could fail with the following error. > {code} > Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because > "org.apache.impala.service.BackendConfig.INSTANCE" is null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc
[ https://issues.apache.org/jira/browse/IMPALA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780523#comment-17780523 ] Fang-Yu Rao commented on IMPALA-12526: -- This issue seems to be the same as IMPALA-11699 but I could not be completely sure. > BackendConfig.INSTANCE could be null in the frontend test > testResetMetadataDesc > --- > > Key: IMPALA-12526 > URL: https://issues.apache.org/jira/browse/IMPALA-12526 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Quanlong Huang >Priority: Major > Labels: broken-build, flaky-test > > We found that > [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could be null in the frontend test > [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65] > and thus > [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] > could fail with the following error. > {code} > Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because > "org.apache.impala.service.BackendConfig.INSTANCE" is null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12526) BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc
Fang-Yu Rao created IMPALA-12526: Summary: BackendConfig.INSTANCE could be null in the frontend test testResetMetadataDesc Key: IMPALA-12526 URL: https://issues.apache.org/jira/browse/IMPALA-12526 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Quanlong Huang We found that [BackendConfig.INSTANCE|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] could be null in the frontend test [testResetMetadataDesc()|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/util/CatalogOpUtilTest.java#L65] and thus [ResetMetadataStmt#toThrift()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L265] could fail with the following error. {code} Cannot invoke "org.apache.impala.service.BackendConfig.getHostname()" because "org.apache.impala.service.BackendConfig.INSTANCE" is null {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12525) statestore.active-status did not reach value True in 120s
Fang-Yu Rao created IMPALA-12525: Summary: statestore.active-status did not reach value True in 120s Key: IMPALA-12525 URL: https://issues.apache.org/jira/browse/IMPALA-12525 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Wenzhe Zhou We found that it's possible that [statestore.active-status|https://github.com/apache/impala/blob/master/tests/custom_cluster/test_statestored_ha.py#L452] could not reach value True in 120s. *+Error Message+* {code:java} AssertionError: Metric statestore.active-status did not reach value True in 120s. Dumping debug webpages in JSON format... Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json Dumping minidumps for impalads/catalogds... Dumped minidump for Impalad PID 32539 Dumped minidump for Impalad PID 32543 Dumped minidump for Impalad PID 32550 Dumped minidump for Catalogd PID 32460 {code} *+Stacktrace+* {code:java} custom_cluster/test_statestored_ha.py:500: in test_statestored_manual_failover self.__test_statestored_manual_failover(second_failover=True) custom_cluster/test_statestored_ha.py:452: in __test_statestored_manual_failover "statestore.active-status", expected_value=True, timeout=120) common/impala_service.py:144: in wait_for_metric_value self.__metric_timeout_assert(metric_name, expected_value, timeout) common/impala_service.py:213: in __metric_timeout_assert assert 0, assert_string E AssertionError: Metric statestore.active-status did not reach value True in 120s. E Dumping debug webpages in JSON format... E Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/memz.json E Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/metrics.json E Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/queries.json E Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/sessions.json E Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/threadz.json E Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20231026_01:53:53/json/rpcz.json E Dumping minidumps for impalads/catalogds... E Dumped minidump for Impalad PID 32539 E Dumped minidump for Impalad PID 32543 E Dumped minidump for Impalad PID 32550 E Dumped minidump for Catalogd PID 32460 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False
[ https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-12522: - Priority: Critical (was: Major) > test_alter_table_recover could finish less than 10 seconds with JDK 17 when > enable_async_ddl_execution is False > --- > > Key: IMPALA-12522 > URL: https://issues.apache.org/jira/browse/IMPALA-12522 > Project: IMPALA > Issue Type: Test >Reporter: Fang-Yu Rao >Assignee: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky-test > > We found that > [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026] > could finish the execution within 10 seconds with JDK 17 when > enable_async_ddl_execution is False and thus the check in the [else > branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12] > could fail. Don't know it has something to do with JDK but maybe we could > reduce the expected execution time a little bit to make the test less flaky. > {code} > # In sync mode: > # The entire DDL is processed in the exec step with delay. exec_time > should be > # more than 10 seconds. > # > # In async mode: > # The compilation of DDL is processed in the exec step without delay. > And the > # processing of the DDL plan is in wait step with delay. The wait time > should > # definitely take more time than 10 seconds. > if enable_async_ddl: > assert(wait_time >= 10) > else: > assert(exec_time >= 10) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False
[ https://issues.apache.org/jira/browse/IMPALA-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780105#comment-17780105 ] Fang-Yu Rao commented on IMPALA-12522: -- Hi [~joemcdonnell], assigned this JIRA to you since you helped review [IMPALA-10811|https://gerrit.cloudera.org/c/17872/38/tests/metadata/test_ddl.py#1012] that added this test. Please reassign the JIRA as you see appropriate. Thanks! > test_alter_table_recover could finish less than 10 seconds with JDK 17 when > enable_async_ddl_execution is False > --- > > Key: IMPALA-12522 > URL: https://issues.apache.org/jira/browse/IMPALA-12522 > Project: IMPALA > Issue Type: Test >Reporter: Fang-Yu Rao >Assignee: Joe McDonnell >Priority: Major > Labels: broken-build, flaky-test > > We found that > [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026] > could finish the execution within 10 seconds with JDK 17 when > enable_async_ddl_execution is False and thus the check in the [else > branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12] > could fail. Don't know it has something to do with JDK but maybe we could > reduce the expected execution time a little bit to make the test less flaky. > {code} > # In sync mode: > # The entire DDL is processed in the exec step with delay. exec_time > should be > # more than 10 seconds. > # > # In async mode: > # The compilation of DDL is processed in the exec step without delay. > And the > # processing of the DDL plan is in wait step with delay. The wait time > should > # definitely take more time than 10 seconds. > if enable_async_ddl: > assert(wait_time >= 10) > else: > assert(exec_time >= 10) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12522) test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False
Fang-Yu Rao created IMPALA-12522: Summary: test_alter_table_recover could finish less than 10 seconds with JDK 17 when enable_async_ddl_execution is False Key: IMPALA-12522 URL: https://issues.apache.org/jira/browse/IMPALA-12522 Project: IMPALA Issue Type: Test Reporter: Fang-Yu Rao Assignee: Joe McDonnell We found that [test_alter_table_recover()|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1026] could finish the execution within 10 seconds with JDK 17 when enable_async_ddl_execution is False and thus the check in the [else branch|https://github.com/apache/impala/blame/master/tests/metadata/test_ddl.py#L1079C12-L1079C12] could fail. Don't know it has something to do with JDK but maybe we could reduce the expected execution time a little bit to make the test less flaky. {code} # In sync mode: # The entire DDL is processed in the exec step with delay. exec_time should be # more than 10 seconds. # # In async mode: # The compilation of DDL is processed in the exec step without delay. And the # processing of the DDL plan is in wait step with delay. The wait time should # definitely take more time than 10 seconds. if enable_async_ddl: assert(wait_time >= 10) else: assert(exec_time >= 10) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778839#comment-17778839 ] Fang-Yu Rao commented on IMPALA-12500: -- Hi [~csringhofer], assigned this JIRA to you since you recently revised the test at [IMPALA-12430|https://github.com/apache/impala/commit/fb2d2b27641a95f51b6789639fab73b60abd7bc5#diff-a317a4067b5728a2d0af9839c1dce94710e7bd50825ceffc0a3c88aca3e27de3R553] and thus may be more familiar with the test. Please feel free to reassign the JIRA as you see fit. Thanks! > TestObservability.test_global_exchange_counters is flaky > > > Key: IMPALA-12500 > URL: https://issues.apache.org/jira/browse/IMPALA-12500 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Csaba Ringhofer >Priority: Critical > Labels: broken-build, flaky > > There have been intermittent failures on this test with the following symptom: > {noformat} > query_test/test_observability.py:564: in test_global_exchange_counters > assert "ExchangeScanRatio: 4.63" in profile > E assert 'ExchangeScanRatio: 4.63' in 'Query > (id=c04b974db37e7046:b5fe4dea):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: > 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: > 0.000ns\n' > -- executing against localhost:21000 > select count(*), sleep(50) from tpch_parquet.orders o > inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey > group by o.o_clerk limit 10; > -- 2023-10-05 19:47:29,817 INFO MainThread: Started query > c04b974db37e7046:b5fe4dea{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-12500: Assignee: Fang-Yu Rao > TestObservability.test_global_exchange_counters is flaky > > > Key: IMPALA-12500 > URL: https://issues.apache.org/jira/browse/IMPALA-12500 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Fang-Yu Rao >Priority: Critical > Labels: broken-build, flaky > > There have been intermittent failures on this test with the following symptom: > {noformat} > query_test/test_observability.py:564: in test_global_exchange_counters > assert "ExchangeScanRatio: 4.63" in profile > E assert 'ExchangeScanRatio: 4.63' in 'Query > (id=c04b974db37e7046:b5fe4dea):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: > 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: > 0.000ns\n' > -- executing against localhost:21000 > select count(*), sleep(50) from tpch_parquet.orders o > inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey > group by o.o_clerk limit 10; > -- 2023-10-05 19:47:29,817 INFO MainThread: Started query > c04b974db37e7046:b5fe4dea{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12500) TestObservability.test_global_exchange_counters is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-12500: Assignee: Csaba Ringhofer (was: Fang-Yu Rao) > TestObservability.test_global_exchange_counters is flaky > > > Key: IMPALA-12500 > URL: https://issues.apache.org/jira/browse/IMPALA-12500 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Csaba Ringhofer >Priority: Critical > Labels: broken-build, flaky > > There have been intermittent failures on this test with the following symptom: > {noformat} > query_test/test_observability.py:564: in test_global_exchange_counters > assert "ExchangeScanRatio: 4.63" in profile > E assert 'ExchangeScanRatio: 4.63' in 'Query > (id=c04b974db37e7046:b5fe4dea):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: > 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: > 0.000ns\n' > -- executing against localhost:21000 > select count(*), sleep(50) from tpch_parquet.orders o > inner join tpch_parquet.lineitem l on o.o_orderkey = l.l_orderkey > group by o.o_clerk limit 10; > -- 2023-10-05 19:47:29,817 INFO MainThread: Started query > c04b974db37e7046:b5fe4dea{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10712) SET OWNER ROLE of a database/table/view is not supported when Ranger is the authorization provider
[ https://issues.apache.org/jira/browse/IMPALA-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775048#comment-17775048 ] Fang-Yu Rao commented on IMPALA-10712: -- It looks like I created a JIRA more than 2 years ago for the same issue. > SET OWNER ROLE of a database/table/view is not supported when > Ranger is the authorization provider > -- > > Key: IMPALA-10712 > URL: https://issues.apache.org/jira/browse/IMPALA-10712 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 4.0.0 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > We found that {{SET OWNER ROLE}} of a database, table, or a view is not > supported when Ranger is the authorization provider. > In the case of set the owner of a database to a given role, when Ranger is > the authorization provider, we found that after executing {{ALTER DATABASE > SET OWNER ROLE }}, we will hit the non-null check > for the given role at > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java#L59] > due to the fact that the {{AuthorizationPolicy}} returned from > {{getAuthPolicy()}} does not cache any policy-related information if the > authorization provider is Ranger, which is different than the case when > Sentry was the authorization provider. > When Ranger is the authorization provider, the currently existing roles are > cached by {{RangerImpalaPlugin}}. Therefore to address the issue above, we > could probably invoke {{getRoles().getRangerRoles()}} provided by the > {{RangerImpalaPlugin}} to retrieve the set of existing roles, similar to what > is done at > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L135]. > Tagged [~joemcdonnell] and [~shajini] since I realized this when reviewing > Joe's comment at > [https://gerrit.cloudera.org/c/17469/1/docs/topics/impala_alter_database.xml#b68]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11466) Add jetty-server as an allowed dependency
[ https://issues.apache.org/jira/browse/IMPALA-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-11466. -- Fix Version/s: Impala 4.3.0 Resolution: Fixed Resolve this JIRA since the fix has been merged thanks to [~rizaon]. > Add jetty-server as an allowed dependency > - > > Key: IMPALA-11466 > URL: https://issues.apache.org/jira/browse/IMPALA-11466 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.3.0 > > > We found after HIVE-21456, the instantiation of HiveMetaStoreClient requires > the class of org.eclipse.jetty.server.Connector, which is a banned dependency > of impala-frontend. This resulted in the failure of the FE test > testTestCaseImport() since it needs to instantiate a > HiveMetaStoreClient. > We should add the required dependency so that the test could be run. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org