pmmag opened a new issue, #17528:
URL: https://github.com/apache/druid/issues/17528
### Affected Version
31.0.0
### Description
Hello,
We have a use case where we use string-based schemaless ingestion for a
datasource where a particular column may or may not be present. We were
querying that data source with a condition to filter out rows where that column
is NOT equal to a particular value (`{"type": "not", "field": {"type":
"selector", "dimension": "our-column", "value": "our-value"}}`). However,
against our expectations, this filter was not returning rows that were missing
this column entirely.
We were able to reproduce this by simply querying on a completely bogus
column:
```
{
"queryType": "scan",
"dataSource": {
"type": "table",
"name": "some-table"
},
"filter": {
"type": "not",
"field": {
"type": "selector",
"dimension": "kfjhdskgjshdk-this-column-obviously-does-not-exist",
"value": "should-not-matter"
}
},
"intervals": [
"0/2025"
]
}
```
Expectation: this filter should be equivalent to no filter at all, because
there are no rows with this column and consequently all rows should match the
condition that this row is not equal to some value.
Reality: no rows are returned. The same happens with an in-filter and even
with a regex filter.
We are now able to work around this by instead filtering on a virtual column
(with `nvl(our-column, 'default_value')`), but we are concerned that this
behavior my silently filter out data in some other cases. Or if this behavior
is by design, we would at least expect to find it in the documentation because
it is very surprising.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]