cristianpop opened a new issue, #18726:
URL: https://github.com/apache/pinot/issues/18726
Queries that look for rows with either nulls or specific values for a column
don't return the rows with null values.
Table schema excerpt:
`{
"schemaName": "test",
"enableColumnBasedNullHandling": true,
"dimensionFieldSpecs": [
...
{
"name": "prop",
"dataType": "STRING",
"fieldType": "DIMENSION"
},
...
]
}`
Using SSE the following query returns the correct count (102493):
`SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop IS NULL
OR prop = 'value'`
On the other hand, using MSE the same query doesn't return the correct count
(80):
`SET useMultistageEngine=true;
SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop IS NULL
OR prop = 'value'`
If I query for rows with the given value for prop using MSE I get the
correct count (80):
`SET useMultistageEngine=true;
SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop = 'value'`
If I query for rows with null values for prop using MSE I get the correct
count (102413):
`SET useMultistageEngine=true;
SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop IS NULL`
As can be seen from the queries above, even if I ask for rows with prop as
null or with given value, MSE only returns the rows that have the given value,
ignoring those with nulls.
I looked at the execution plan and it looks fine since it has NULL AS TRUE
set:
`Execution Plan
PinotLogicalAggregate(group=[{}], agg#0=[COUNT($0)], aggType=[FINAL])
PinotLogicalExchange(distribution=[hash])
PinotLogicalAggregate(group=[{}], agg#0=[COUNT()], aggType=[LEAF])
LogicalFilter(condition=[SEARCH($5, Sarg[_UTF-8'value':VARCHAR
CHARACTER SET "UTF-8"; NULL AS TRUE]:VARCHAR CHARACTER SET "UTF-8")])
PinotLogicalTableScan(table=[[default, test]])`
It seems like the null as true flag gets lost in
RexExpressionUtils::handleSearch because it doesn't take it into account at
all. I already have a commit right
[here](https://github.com/apache/pinot/compare/master...cristianpop:pinot:wip-fix-mse-is-null-or-value-query-handling)
that fixes this issue so I can open a PR for it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]