[jira] [Updated] (IGNITE-16443) Calcite engine. Incorrect nulls in search row processing by hash/sorted spools

Aleksey Plekhanov (Jira) Wed, 02 Feb 2022 12:41:43 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-16443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aleksey Plekhanov updated IGNITE-16443:
---------------------------------------
    Description: 
Currently, the sorted spool comparator treats nulls in the search row as 
regular values and applies nulls ordering to them, but these nulls have another 
meaning: it means that any value matches the bound.

For example, if we have the condition for spool {{a > 0}} it will be 
transformed to bounds: {{{}lower [0]{}}}, {{{}upper [null]{}}}, after comparing 
rows with lower bound we will get the correct result, but comparing with upper 
bound always give us {{1}} (value > null) with {{nulls first}} collation and no 
rows will be returned by index spool.

For hash spool there is the reverse problem, we should not find rows if nulls 
are present in the search row, since condition NULL=NULL should not satisfy the 
filter in SQL.

For index scan, there are no problems that lead to data inconsistency, but in 
the case of nulls in the search row, any value matches the bound and index scan 
becomes very ineffective. For example, if we have the index by field {{a}} and 
filter {{{}a = $cor0.b{}}}, for {{$cor0.b = null }}there will be the full index 
scan, all rows will be passed to the predicate and 0 rows will be produced 
after the predicate. 

Another problem with comparators: there is a dead code in 
{{ExpressionFactoryImpl#comparator(RelFieldCollation)}} which compare object by 
hash or bytes:
 * This code never executes since we support now only comparable types by 
Calcite-based SQL engine
 * Depends on H2
 * Serialize object by standard java serialization
 * Works only for ASC ordering
 * Works for sort/spool/sort aggregate nodes, but doesn't work for merge join

I think we can get rid of this code at least until we don't support Object 
types and revert corresponding changes to ignite-indexing to avoid merge 
conflicts.

  was:
Currently, the sorted spool comparator treats nulls in the search row as 
regular values and applies nulls ordering to them, but these nulls have another 
meaning: it means that any value matches the bound.

For example, if we have the condition for spool {{a > 0}} it will be 
transformed to bounds: {{lower [0]}}, {{upper [null]}}, after comparing rows 
with lower bound we will get the correct result, but comparing with upper bound 
always give us {{1}} (value > null) with {{nulls first}} collation and no rows 
will be returned by index spool.

For hash spool there is the reverse problem, we should not find rows if nulls 
are present in the search row, since condition NULL=NULL should not satisfy the 
filter in SQL.

Another problem with comparators: there is a dead code in 
{{ExpressionFactoryImpl#comparator(RelFieldCollation)}} which compare object by 
hash or bytes:
 * This code never executes since we support now only comparable types by 
Calcite-based SQL engine
 * Depends on H2
 * Serialize object by standard java serialization
 * Works only for ASC ordering
 * Works for sort/spool/sort aggregate nodes, but doesn't work for merge join

I think we can get rid of this code at least until we don't support Object 
types and revert corresponding changes to ignite-indexing to avoid merge 
conflicts.


> Calcite engine. Incorrect nulls in search row processing by hash/sorted spools
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-16443
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16443
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Aleksey Plekhanov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: calcite2-required, calcite3-required
>
> Currently, the sorted spool comparator treats nulls in the search row as 
> regular values and applies nulls ordering to them, but these nulls have 
> another meaning: it means that any value matches the bound.
> For example, if we have the condition for spool {{a > 0}} it will be 
> transformed to bounds: {{{}lower [0]{}}}, {{{}upper [null]{}}}, after 
> comparing rows with lower bound we will get the correct result, but comparing 
> with upper bound always give us {{1}} (value > null) with {{nulls first}} 
> collation and no rows will be returned by index spool.
> For hash spool there is the reverse problem, we should not find rows if nulls 
> are present in the search row, since condition NULL=NULL should not satisfy 
> the filter in SQL.
> For index scan, there are no problems that lead to data inconsistency, but in 
> the case of nulls in the search row, any value matches the bound and index 
> scan becomes very ineffective. For example, if we have the index by field 
> {{a}} and filter {{{}a = $cor0.b{}}}, for {{$cor0.b = null }}there will be 
> the full index scan, all rows will be passed to the predicate and 0 rows will 
> be produced after the predicate. 
> Another problem with comparators: there is a dead code in 
> {{ExpressionFactoryImpl#comparator(RelFieldCollation)}} which compare object 
> by hash or bytes:
>  * This code never executes since we support now only comparable types by 
> Calcite-based SQL engine
>  * Depends on H2
>  * Serialize object by standard java serialization
>  * Works only for ASC ordering
>  * Works for sort/spool/sort aggregate nodes, but doesn't work for merge join
> I think we can get rid of this code at least until we don't support Object 
> types and revert corresponding changes to ignite-indexing to avoid merge 
> conflicts.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (IGNITE-16443) Calcite engine. Incorrect nulls in search row processing by hash/sorted spools

Reply via email to