> > So here are my recommendations for moving forward, with DataSourceV2 as a > starting point: > > 1. Use well-defined logical plan nodes for all high-level operations: > insert, create, CTAS, overwrite table, etc. > 2. Use rules that match on these high-level plan nodes, so that it > isn’t necessary to create rules to match each eventual code path > individually > 3. Define Spark’s behavior for these logical plan nodes. Physical > nodes should implement that behavior, but all CREATE TABLE OVERWRITE should > (eventually) make the same guarantees. > 4. Specialize implementation when creating a physical plan, not > logical plans. > > I realize this is really long, but I’d like to hear thoughts about this. > I’m sure I’ve left out some additional context, but I think the main idea > here is solid: lets standardize logical plans for more consistent behavior > and easier maintenance. > Context aside, I really like these rules! I think having query planning be the boundary for specialization makes a lot of sense.
(RunnableCommand might also be my fault though.... sorry! :P)