yashmayya opened a new pull request, #13139:
URL: https://github.com/apache/pinot/pull/13139
- The JSON index currently doesn't support nested (within `AND` / `OR`)
exclusive predicates (`NOT IN`, `!=`, `IS NULL`) with the `JSON_MATCH` filter
operator.
- The reason for this is the existing semantics for exclusive predicates
that are inconsistent and confusing.
- A document like:
- ```
[
{
"key1": "value1",
"key2": 1
},
{
"key1": "value2",
"key2": 2
}
]
```
matches `"$[*].key2" = 1`, but not `"$[*].key2" != 2` for example.
- Essentially, the wildcard array access for inclusive predicates means that
we match the array when ANY value matches, but for exclusive predicates, we
match the array when ALL values match.
- Furthermore, a document like:
- ```
[
{
"key1": "value1",
}
]
```
matches `"$[0].key2" != 2` which is counterintuitive since the object
doesn’t contain `key2` at all.
- The reason for both of these inconsistencies and confusing semantics is
the way that exclusive predicates are implemented. Exclusive predicates are
currently handled by finding the [flattened doc
ID](https://docs.pinot.apache.org/basics/indexing/json-index#example)s of the
corresponding inclusive predicate, mapping them back to the corresponding
unflattened doc IDs, and then inverting / flipping the resultant bitmap. And
that is also the reason why nested exclusive predicates aren’t supported
currently since the intermediate flattened doc ID results can’t be combined
with flattened doc ID results from other child predicates while maintaining the
same semantics.
- This patch fixes the above inconsistencies so that `[*]` now means the
same for both exclusive and inclusive predicates, and documents where a key
doesn't exist aren't matched for the exclusive predicates `NOT IN`, `!=`. It
also adds support for nested exclusive predicates.
- The new implementation for `NOT IN`, `!=` make use of similar logic to the
implementation for the regex and range predicates
(https://github.com/apache/pinot/pull/12568).
- Essentially, we first get all the values for a corresponding key. Then, we
iterate through the values - and for each value that matches the predicate, we
add the list of corresponding flattened doc IDs to the result.
- The `IS NULL` predicate is handled by inverting the flattened doc ID
bitmap of the corresponding `IS NOT NULL` predicate.
- Suggested labels: `feature`, `backward-incompat`, `release-notes`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]