alamb commented on PR #12606: URL: https://github.com/apache/datafusion/pull/12606#issuecomment-2374840656
> I now want to add an index for point lookups (I plan on implementing it as a column with distinct array values, but that's a bit of an implementation detail). > The point is that when PruningPredicate encounters this column (for which there are no stats, and which it doesn't recognize because I only pass in Fields for which there are stats) it currently returns true such that a_column_with_stats = 123 and a_point_lookup_column = 'abc' becomes a_column_with_stats_min <= 123 and a_column_with_stats_max >= 123 and true (ignoring nulls, maybe simplifying other bits) but I want it to become a_column_with_stats_min <= 123 and a_column_with_stats_max >= 123 and a_point_lookup_column @> '{abc}'::text[] or something like that. Perhaps you can rewrite the predicate before passing it to the parquet exec or the `PruningPredicate`? I don't fully understand what `a_point_lookup_column @> '{abc}'::text[]` means but it seems like you could easily do that rewrite / substitution before PruningPredicate. I don't understand the benefit that is obtained by doing the rewrite during the pruning predicate rewrite 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org