pmmag opened a new issue, #17528:
URL: https://github.com/apache/druid/issues/17528

   ### Affected Version
   
   31.0.0
   
   ### Description
   
   Hello,
   
   We have a use case where we use string-based schemaless ingestion for a 
datasource where a particular column may or may not be present. We were 
querying that data source with a condition to filter out rows where that column 
is NOT equal to a particular value (`{"type": "not", "field": {"type": 
"selector", "dimension": "our-column", "value": "our-value"}}`). However, 
against our expectations, this filter was not returning rows that were missing 
this column entirely.
   
   We were able to reproduce this by simply querying on a completely bogus 
column:
   
   ```
   {
     "queryType": "scan",
     "dataSource": {
       "type": "table",
       "name": "some-table"
     },
     "filter": {
       "type": "not",
       "field": {
         "type": "selector",
         "dimension": "kfjhdskgjshdk-this-column-obviously-does-not-exist",
         "value": "should-not-matter"
       }
     },
     "intervals": [
       "0/2025"
     ]
   }
   ```
   
   Expectation: this filter should be equivalent to no filter at all, because 
there are no rows with this column and consequently all rows should match the 
condition that this row is not equal to some value.
   
   Reality: no rows are returned. The same happens with an in-filter and even 
with a regex filter.
   
   We are now able to work around this by instead filtering on a virtual column 
(with `nvl(our-column, 'default_value')`), but we are concerned that this 
behavior my silently filter out data in some other cases. Or if this behavior 
is by design, we would at least expect to find it in the documentation because 
it is very surprising.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to