Instead of exploring possible operations ourselves, I think we should
follow the SQL standard.
Most of these do. We should make conscious decisions with the standard in
mind for the SQL API. But we also have the Scala API (and versions of it in
other languages) and need to consider how these
I think many advanced Spark users already have customer catalyst rules, to
deal with the query plan directly, so it makes a lot of sense to
standardize the logical plan. However, instead of exploring possible
operations ourselves, I think we should follow the SQL standard.
ReplaceTable, RTAS:
Thanks for responding!
I’ve been coming up with a list of the high-level operations that are
needed. I think all of them come down to 5 questions about what’s happening:
- Does the target table exist?
- If it does exist, should it be dropped?
- If not, should it get created?
- Should
>
> So here are my recommendations for moving forward, with DataSourceV2 as a
> starting point:
>
>1. Use well-defined logical plan nodes for all high-level operations:
>insert, create, CTAS, overwrite table, etc.
>2. Use rules that match on these high-level plan nodes, so that it
>
Over the last couple years, I’ve noticed a trend toward specialized logical
plans and increasing use of RunnableCommand nodes. DataSourceV2 is
currently on the same path, and I’d like to make the case that we should
avoid these practices.
I think it’s helpful to consider an example I’ve been