eldenmoon commented on code in PR #63118:
URL: https://github.com/apache/doris/pull/63118#discussion_r3219017533
##########
be/src/storage/segment/segment.h:
##########
@@ -182,7 +184,23 @@ class Segment : public
std::enable_shared_from_this<Segment>, public MetadataAdd
std::shared_ptr<const IDataType> get_data_type_of(const TabletColumn&
column,
const
StorageReadOptions& read_options);
- // If column in segment is the same type in schema, then it is safe to
apply predicate.
+ static bool _is_variant_predicate_type_compatible(const DataTypePtr&
storage_type,
+ const DataTypePtr&
target_type) {
+ auto normalized_storage_type = remove_nullable(storage_type);
+ auto normalized_target_type = remove_nullable(target_type);
+ if (normalized_storage_type->equals(*normalized_target_type)) {
+ return true;
+ }
+
+ auto storage_primitive_type =
normalized_storage_type->get_primitive_type();
+ auto target_primitive_type =
normalized_target_type->get_primitive_type();
+ return (is_int(storage_primitive_type) &&
is_int(target_primitive_type)) ||
Review Comment:
This family-level compatibility is not safe for the inverted-index predicate
path. Column predicates are still instantiated with the cast target type, so
`InvertedIndexQueryParamFactory` stores the query value using that target type,
while `SegmentIterator` passes the segment storage type to the index reader.
The BKD reader then encodes the same pointer with the storage type's
`KeyCoder`. For example, if the segment stored a variant path as `BIGINT` and
the query casts it to `INT`/`TINYINT`, the BIGINT key coder reads past the
narrower query value; in the other direction, an `INT` query value against a
`TINYINT` index can be truncated/wrapped and hit unrelated rows. Since a
successful index evaluation removes the predicate from the remaining filters,
this can become a wrong result rather than just a missed optimization. Please
keep this to exact storage field type after nullable removal, or explicitly
convert the predicate value to the segment storage type with range/overflow chec
ks before using the index, and add cross-width cast regression coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]