alamb commented on PR #12606:
URL: https://github.com/apache/datafusion/pull/12606#issuecomment-2374840656

   > I now want to add an index for point lookups (I plan on implementing it as 
a column with distinct array values, but that's a bit of an implementation 
detail).
   
   > The point is that when PruningPredicate encounters this column (for which 
there are no stats, and which it doesn't recognize because I only pass in 
Fields for which there are stats) it currently returns true such that 
a_column_with_stats = 123 and a_point_lookup_column = 'abc' becomes 
a_column_with_stats_min <= 123 and a_column_with_stats_max >= 123 and true 
(ignoring nulls, maybe simplifying other bits) but I want it to become 
a_column_with_stats_min <= 123 and a_column_with_stats_max >= 123 and 
a_point_lookup_column @> '{abc}'::text[] or something like that.
   
   Perhaps you can rewrite the predicate before passing it to the parquet exec 
or the `PruningPredicate`? I don't fully understand what `a_point_lookup_column 
@> '{abc}'::text[]` means but it seems like you could easily do that rewrite / 
substitution before PruningPredicate.
   
   I don't understand the benefit that is obtained by doing the rewrite during 
the pruning predicate rewrite 🤔 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to