alamb commented on code in PR #17273:
URL: https://github.com/apache/datafusion/pull/17273#discussion_r2304261369
##########
datafusion/catalog/src/table.rs:
##########
@@ -299,6 +328,75 @@ pub trait TableProvider: Debug + Sync + Send {
}
}
+#[derive(Debug, Clone, Default)]
+pub struct ScanArgs {
+ preferred_ordering: Option<Vec<SortExpr>>,
+ filters: Option<Vec<Expr>>,
+ projection: Option<Vec<usize>>,
+ limit: Option<usize>,
+}
+
+impl ScanArgs {
+ pub fn with_projection(mut self, projection: Option<Vec<usize>>) -> Self {
+ self.projection = projection;
+ self
+ }
+
+ pub fn projection(&self) -> Option<Vec<usize>> {
+ self.projection.clone()
+ }
+
+ pub fn with_filters(mut self, filters: Option<Vec<Expr>>) -> Self {
+ self.filters = filters;
+ self
+ }
+
+ pub fn filters(&self) -> Option<&[Expr]> {
+ self.filters.as_deref()
+ }
+
+ pub fn with_limit(mut self, limit: Option<usize>) -> Self {
+ self.limit = limit;
+ self
+ }
+
+ pub fn limit(&self) -> Option<usize> {
+ self.limit
+ }
+
+ pub fn with_preferred_ordering(mut self, ordering: Option<Vec<SortExpr>>)
-> Self {
+ self.preferred_ordering = ordering;
+ self
+ }
+
+ pub fn preferred_ordering(&self) -> Option<&[SortExpr]> {
+ self.preferred_ordering.as_deref()
+ }
+}
+
+#[derive(Debug, Clone)]
+pub struct ScanResult {
+ /// The ExecutionPlan to run.
+ plan: Arc<dyn ExecutionPlan>,
+ // Remaining filters that were not completely evaluated during
`scan_with_args()`.
+ // These were previously referred to as "unsupported filters" or "inexact
filters".
+ filters: Vec<Expr>,
Review Comment:
I think this is confusing as the "can the filters be supported" is currently
a separate API. I think we should have one or the other but not both.
I realize that `scan_with_args` is basically the "create the appropriate
physical plan" so logically it makes sense here.
##########
datafusion/catalog/src/table.rs:
##########
@@ -171,6 +173,35 @@ pub trait TableProvider: Debug + Sync + Send {
limit: Option<usize>,
) -> Result<Arc<dyn ExecutionPlan>>;
+ async fn scan_with_args(
+ &self,
+ state: &dyn Session,
+ args: ScanArgs,
+ ) -> Result<ScanResult> {
+ let ScanArgs {
+ preferred_ordering: _,
+ filters,
+ projection,
+ limit,
+ } = args;
+ let filters = filters.unwrap_or_default();
+ let unsupported_filters = self
+ .supports_filters_pushdown(&filters.iter().collect_vec())?
Review Comment:
I recommend doing this as a follow on PR -- and the key consideration would
be how disruptive it would be to existing users vs the benefit existing users
and new users would get
##########
datafusion/catalog/src/table.rs:
##########
@@ -171,6 +172,34 @@ pub trait TableProvider: Debug + Sync + Send {
limit: Option<usize>,
) -> Result<Arc<dyn ExecutionPlan>>;
+ async fn scan_with_args(
Review Comment:
I think this is a great new API and makes a lot of sense as it allows us to
iterate / expand the scan API over time with less API disruption
It is also consistent with
[`ScalarUDFImpl::invoke_with_args`](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#tymethod.invoke_with_args)
To merge this PR I think we should:
1. document this function (perhaps move the docs from `scan` here and then
direct people to use `scan_with_args` with new TableProviders
2. Migrate all existing code in Datafusion to use `scan_with_args`
3. Consider deprecating `scan`
(maybe we can do this as a follow on PR)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]