theirix commented on PR #16861: URL: https://github.com/apache/datafusion/pull/16861#issuecomment-3110119344
Thank you, @adriangb ! I can confirm that it works great with the table sampling, since I use `random` function (matched by name): ``` query TT EXPLAIN SELECT COUNT(*) from t TABLESAMPLE 42 WHERE a < 10; ---- logical_plan 01)Projection: count(Int64(1)) AS count(*) 02)--Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]] 03)----Projection: 04)------Filter: t.a < Int32(10) AND random() < Float64(0.42) 05)--------TableScan: t projection=[a] physical_plan 01)ProjectionExec: expr=[count(Int64(1))@0 as count(*)] 02)--AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))] 03)----CoalescePartitionsExec 04)------AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] 05)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 06)----------ProjectionExec: expr=[] 07)------------CoalesceBatchesExec: target_batch_size=8192 08)--------------FilterExec: a@0 < 10 AND random() < 0.42 09)----------------DataSourceExec: partitions=1, partition_sizes=[1] ``` The volatile filter is not pushed to the datasource. Without this patch, it looked like `predicate=random() < 0.1`. I agree it'd be more scalable to have an abstract way to specify UDF volatility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org