Hi Gabor, First, a little context... one of the goals of DSv2 is to standardize the behavior of SQL operations in Spark. For example, running CTAS when a table exists will fail, not take some action depending on what the source chooses, like drop & CTAS, inserting, or failing.
Unfortunately, this means that DSv1 can't be easily replaced because it has behavior differences between sources. In addition, we're not really sure how DSv1 works in all cases -- it really depends on what seemed reasonable to authors at the time. For example, we don't have a good understanding of how file-based tables behave (those not backed by a Metastore). There are also changes that we know are breaking and are okay with, like only inserting safe casts when writing with v2. Because of this, we can't just replace v1 with v2 transparently, so the plan is to allow deployments to migrate to v2 in stages. Here's the plan: 1. Use v1 by default so all existing queries work as they do today for identifiers like `db.table` 2. Allow users to add additional v2 catalogs that will be used when identifiers specifically start with one, like `test_catalog.db.table` 3. Add a v2 catalog that delegates to the session catalog, so that v2 read/write implementations can be used, but are stored just like v1 tables in the session catalog 4. Add a setting to use a v2 catalog as the default. Setting this would use a v2 catalog for all identifiers without a catalog, like `db.table` 5. Add a way for a v2 catalog to return a table that gets converted to v1. This is what `CatalogTableAsV2` does in #24768 <https://github.com/apache/spark/pull/24768>. PR #24768 <https://github.com/apache/spark/pull/24768> implements the rest of these changes. Specifically, we initially used the default catalog for v2 sources, but that causes namespace problems, so we need the v2 session catalog (point #3) as the default when there is no default v2 catalog. I hope that answers your question. If not, I'm happy to answer follow-ups and we can add this as a topic in the next v2 sync on Wednesday. I'm also planning on talking about metadata columns or function push-down from the Kafka v2 PR at that sync, so you may want to attend. rb On Thu, Jun 20, 2019 at 4:45 AM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote: > Hi All, > > I've taken a look at the code and docs to find out when DSv1 sources has > to be removed (in case of DSv2 replacement is implemented). After some > digging I've found DSv1 sources which are already removed but in some cases > v1 and v2 still exists in parallel. > > Can somebody please tell me what's the overall plan in this area? > > BR, > G > > -- Ryan Blue Software Engineer Netflix