[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Sep 2023 17:00:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-27 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Sep 2023 12:34:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 5:

(1 comment)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20460/4/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/20460/4/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@94
PS4, Line 94: Please note that left side's cardinality already takes the 
selectivity into
: // account
> We could mention this also in the commit message.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Sep 2023 12:32:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Noemi Pap-Takacs, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20460

to look at the new patch set (#5).

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..

IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with 
deletes

Currently IcebergDeleteNode's cardinality is the same as the LHS's
cardinality, i.e. we don't take the RHS into account. The RHS contains
the position delete records, so it is a fair assumption that all records
at RHS remove a record from LHS (duplicated delete records should be
extremely rare).

If there are conjuncts on the Iceberg table we can assume that they have
the same selectivity on the data records and on the delete records.

With the above assumptions this change updates the cardinality of the
IcebergDeleteNode with basically the following formula:

 Card(IcebergDeleteNode) = Card(LHS) - Selectivity(LHS) * Card(RHS);

Please note that left side's cardinality already takes the selectivity
into account.

To deal with edge cases when there are lots of duplicated delete
records (shouldn't happen in normal usage), we return at least 1
cardinality, so the actual formula is:

 Card(IcebergDeleteNode) =
   Max(
 1,
 Card(LHS) - Selectivity(LHS) * Card(RHS)
   );

Testing:
 * updated the planner tests

Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
---
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
3 files changed, 97 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20460/5
--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9764/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Sep 2023 12:34:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-27 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 4:

(1 comment)

Just a small thing, I can +2 after this.

http://gerrit.cloudera.org:8080/#/c/20460/4/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/20460/4/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@94
PS4, Line 94: Please note that left side's cardinality already takes the 
selectivity into
: // account
We could mention this also in the commit message.



--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Sep 2023 12:27:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-27 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 4: Code-Review+1

LGTM! Thanks Zoltan!


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Sep 2023 12:16:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-26 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 4: Code-Review+1

Nice change, looks good to me.


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 26 Sep 2023 12:30:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20460/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20460/3//COMMIT_MSG@12
PS3, Line 12: LHS
> Is it LHS? Every position delete record removes a data record from the LHS,
Yes, thank you!



--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 26 Sep 2023 09:54:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-26 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Noemi Pap-Takacs, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20460

to look at the new patch set (#4).

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..

IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with 
deletes

Currently IcebergDeleteNode's cardinality is the same as the LHS's
cardinality, i.e. we don't take the RHS into account. The RHS contains
the position delete records, so it is a fair assumption that all records
at RHS remove a record from LHS (duplicated delete records should be
extremely rare).

If there are conjuncts on the Iceberg table we can assume that they have
the same selectivity on the data records and on the delete records.

With the above assumptions this change updates the cardinality of the
IcebergDeleteNode with basically the following formula:

 Card(IcebergDeleteNode) = Card(LHS) - Selectivity(LHS) * Card(RHS);

To deal with edge cases when there are lots of duplicated delete
records (shouldn't happen in normal usage), we return at least 1
cardinality, so the actual formula is:

 Card(IcebergDeleteNode) =
   Max(
 1,
 Card(LHS) - Selectivity(LHS) * Card(RHS)
   );

Testing:
 * updated the planner tests

Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
---
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
3 files changed, 97 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20460/4
--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-25 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 3: Code-Review+1

(1 comment)

LGTM

http://gerrit.cloudera.org:8080/#/c/20460/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20460/3//COMMIT_MSG@12
PS3, Line 12: RHS
Is it LHS? Every position delete record removes a data record from the LHS, 
right?



--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Sep 2023 14:03:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-14 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 3: Code-Review+1

Thanks, looks good to me. Someone more knowledgeable about Iceberg should also 
take a look.


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Sep 2023 15:24:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13988/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Sep 2023 16:44:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 3:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@98
PS2, Line 98: have non-ze
> Thanks, I like the idea about a precondition check and using 1 in the Math.
Done


http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@100
PS2, Line 100: ondition
> Thanks. It may be worth mentioning it either here or in the commit message.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Sep 2023 16:17:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20460

to look at the new patch set (#3).

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..

IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with 
deletes

Currently IcebergDeleteNode's cardinality is the same as the LHS's
cardinality, i.e. we don't take the RHS into account. The RHS contains
the position delete records, so it is a fair assumption that all records
at RHS remove a record from RHS (duplicated delete records should be
extremely rare).

If there are conjuncts on the Iceberg table we can assume that they have
the same selectivity on the data records and on the delete records.

With the above assumptions this change updates the cardinality of the
IcebergDeleteNode with basically the following formula:

 Card(IcebergDeleteNode) = Card(LHS) - Selectivity(LHS) * Card(RHS);

To deal with edge cases when there are lots of duplicated delete
records (shouldn't happen in normal usage), we return at least 1
cardinality, so the actual formula is:

 Card(IcebergDeleteNode) =
   Max(
 1,
 Card(LHS) - Selectivity(LHS) * Card(RHS)
   );

Testing:
 * updated the planner tests

Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
---
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
3 files changed, 97 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20460/3
--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-11 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@98
PS2, Line 98: is not zero
> If there are input files to scan, leftCard will always be greater than 0, s
Thanks, I like the idea about a precondition check and using 1 in the 
Math.math() call.


http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@100
PS2, Line 100: leftCard
> Yes.
Thanks. It may be worth mentioning it either here or in the commit message.



--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 11 Sep 2023 11:19:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-08 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 2:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@98
PS2, Line 98: is not zero
> What happens if it is zero? We don't reach this code then?
If there are input files to scan, leftCard will always be greater than 0, so we 
always report at least one cardinality.

This is in line with: 
https://github.com/apache/impala/blob/0f55e551bc98843c79a9ec82582ddca237aa4fe9/fe/src/main/java/org/apache/impala/planner/PlanNode.java#L747

Actually, if the leftCard is 0 (no input files), then the planner shouldn't 
even create this node. So alternatively we could probably use a 
Preconditions.checkState() and a bit simpler formula:

cardinality_ = Math.max(
1,
leftCard - (long)(leftSelectivity * rightCard));


http://gerrit.cloudera.org:8080/#/c/20460/2/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@100
PS2, Line 100: leftCard
> For 'leftCard' selectivity has already been taken into account, that's why
Yes.



--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 08 Sep 2023 14:28:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20460 )

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13929/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 06 Sep 2023 14:32:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-06 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20460

to look at the new patch set (#2).

Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..

IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with 
deletes

Currently IcebergDeleteNode's cardinality is the same as the LHS's
cardinality, i.e. we don't take the RHS into account. The RHS contains
the position delete records, so it is a fair assumption that all records
at RHS remove a record from RHS (duplicated delete records should be
extremely rare).

If there are conjuncts on the Iceberg table we can assume that they have
the same selectivity on the data records and on the delete records.

With the above assumptions this change updates the cardinality of the
IcebergDeleteNode with basically the following formula:

 Card(IcebergDeleteNode) = Card(LHS) - Selectivity(LHS) * Card(RHS);

To deal with edge cases when there are lots of duplicated delete
records (shouldn't happen in normal usage), we actually use a slightly
more complex formula:

 Card(IcebergDeleteNode) =
   Max(
 Min(1, Card(LHS))),
 Card(LHS) - Selectivity(LHS) * Card(RHS)
   );

Testing:
 * updated the planner tests

Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
---
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
3 files changed, 94 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20460/2
--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with deletes

2023-09-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20460


Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
..

IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with 
deletes

Currently IcebergDeleteNode's cardinality is the same as the LHS's
cardinality, i.e. we don't take the RHS into account. The RHS contains
the position delete records, so it is a fair assumption that all records
at RHS remove a record from RHS (duplicated delete records should be
extremely rare).

If there are conjuncts on the Iceberg table we can assume that they have
the same selectivity on the data records and on the delete records.

With the above assumptions this change updates the cardinality of the
IcebergDeleteNode with the basically the following formula:

 Card(IcebergDeleteNode) = Card(LHS) - Selectivity(LHS) * Card(RHS);

To deal with edge cases when there are lots of duplicated delete
records, we actually use a slightly more complex formula:

 Card(IcebergDeleteNode) =
   Max(
 Min(1, Card(LHS))),
 Card(LHS) - Selectivity(LHS) * Card(RHS)
   );

Testing:
 * updated the planner tests

Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
---
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
3 files changed, 94 insertions(+), 79 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20460/1
--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy