Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20387 I dig into the commit history and recalled why I made these decisions: * having an mutable `DataSourceV2Relation`. This is mostly to avoid to keep adding more constructor parameters to `DataSourceV2Relation`, make the code easy to maintain. I'm ok to make it immutable if there is an significant benefit. * not using `PhysicalOperation`. This is because we will add more push down optimizations(e.g. limit, aggregate, join), and we have a specify push down order for them. It's hard to improve `PhysicalOperation` to support more operators and specific push down orders, so I created the new one. Eventually all data sources will be implemented as data source v2, so `PhysicalOperation` will go away. > The output of DataSourceV2Relation should be what is returned by the reader, in case the reader can only partially satisfy the requested schema projection Good catch! Since `DataSourceV2Reader` is mutable, the output can't be fixed, as it may change when we apply data source optimizations. Using `lazy val output ...` can fix this. > The requested projection passed to the DataSourceV2Reader should include filter columns I did this intentionally. If a column is only refered by pushed filters, Spark doesn't need this column. Even if we require this column from the data source, we just read it out and wait it to be pruned by the next operator. > The push-down rule may be run more than once if filters are not pushed through projections This looks weird, do you have a query to reproduce this issue? > This updates DataFrameReader to parse locations that do not look like paths as table names and pass the result as "database" and "table" keys in v2 options. Personally I'd suggest to use `spark.read.format("iceberg").option("table", "db.table").load()`, as `load` is defined as `def load(paths: String*)`, but I think your usage looks better. The communition protocol between Spark and data source is options, I'd suggest that we just propogate the `paths` parameter to options, and data source implementations are free to interprete the path option to whatever they want, e.g. table and database names.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org