Copilot commented on code in PR #566:
URL: https://github.com/apache/hudi-rs/pull/566#discussion_r3006385701
##########
crates/core/src/expr/filter.rs:
##########
@@ -68,6 +99,7 @@ impl TryFrom<(&str, &str, &str)> for Filter {
field_name,
operator,
field_value,
+ field_values: Vec::new(),
})
Review Comment:
`Filter::try_from((&str, &str, &str))` always leaves `field_values` empty.
When the operator is `IN`/`NOT IN` this will later fail in
`SchemableFilter::try_from` (it requires non-empty `field_values`), which
breaks the existing `from_str_tuples`/`get_file_slices_*` APIs and the new
DataFusion pushdown path (which passes a single string value). Consider either
(a) teaching this parsing path to decode the 3rd tuple element into
`field_values` when operator is `IN`/`NOT IN` (and validate non-empty), or (b)
changing the cross-crate API so DataFusion can pass structured `Filter` values
instead of flattening to `(String, String, String)`.
##########
crates/core/src/keygen/timestamp_based.rs:
##########
@@ -467,6 +486,7 @@ impl KeyGeneratorFilterTransformer for
TimestampBasedKeyGenerator {
field_name: field_name.clone(),
operator: ExprOperator::Eq,
field_value: value.clone(),
+ field_values: Vec::new(),
});
Review Comment:
The added `field_values: Vec::new(),` lines are misindented inside `Filter {
... }` initializers (they appear to be flush-left compared to the other
fields). Please run `rustfmt` or adjust indentation to match surrounding style.
##########
crates/core/src/table/partition.rs:
##########
@@ -393,7 +393,7 @@ mod tests {
field_name: "date".to_string(),
operator: ExprOperator::Eq,
field_value: "2023-01-01".to_string(),
- };
+ field_values: Vec::new(), };
Review Comment:
Several test struct literals have `field_values: Vec::new(),` and the
closing `};` on the same line with odd spacing. This should be reformatted
(e.g., run `rustfmt`) to keep the test code readable and consistent.
##########
crates/datafusion/src/util/expr.rs:
##########
@@ -176,6 +178,45 @@ fn between_to_filters(between: &Between) ->
Vec<HudiFilter> {
]
}
+/// Converts an IN list expression into a HudiFilter with IN or NOT IN
operator.
+///
+/// Returns None if the expression is not supported (non-column expression or
non-literal values).
+fn inlist_expr_to_filter(in_list: &InList) -> Option<HudiFilter> {
+ // Extract column name
+ let column = match in_list.expr.as_ref() {
+ Expr::Column(col) => col,
+ _ => {
+ debug!("IN list with non-column expression cannot be pushed down");
+ return None;
+ }
+ };
+
+ // Extract literal values from the list
+ let values: Vec<String> = in_list
+ .list
+ .iter()
+ .filter_map(|expr| match expr {
+ Expr::Literal(lit, _) => Some(lit.to_string()),
+ _ => None,
+ })
+ .collect();
+
+ // If not all values are literals, skip pushdown
+ if values.len() != in_list.list.len() {
+ debug!("IN list contains non-literal values, cannot be pushed down");
+ return None;
+ }
Review Comment:
`inlist_expr_to_filter` can return a filter even when `in_list.list` is
empty (e.g., `col IN ()`). That creates an IN/NOT IN filter with an empty value
list, which later errors in core (`SchemableFilter::try_from` requires
non-empty values) and can break query planning/pruning. Consider explicitly
skipping pushdown for empty lists (return `None`) or translating the empty-list
case to a safe no-op/constant expression rather than emitting an invalid filter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]