Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236487464 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; --- End diff -- #21306 (TableCatalog support) adds this class as `org.apache.spark.sql.catalog.v2.Table` in the `spark-catalyst` module. I think it needs to be in the catalyst module and should probably be in the `o.a.s.sql.catalog.v2` package as well. The important one is moving this to the catalyst module. The analyzer is in catalyst and all of the v2 logical plans and analysis rules will be in catalyst as well, because we are standardizing behavior. The standard validation rules should be in catalyst, not in a source-specific or hive-specific package in the sql-core or hive modules. Because the logical plans and validation rules are in the catalyst package, the `TableCatalog` API needs to be there as well. For example, when a [catalog table identifier](https://github.com/apache/spark/pull/21978) is resolved for a read query, one of the results is a `TableCatalog` instance for the catalog portion of the identifier. That catalog is used to load the v2 table, which is then wrapped in a v2 relation for further analysis. Similarly, the write path should also validate that the catalog exists during analysis by loading it, and would then pass the catalog in a v2 logical plan for `CreateTable` or `CreateTableAsSelect`. I also think that it makes sense to use the `org.apache.spark.sql.catalog.v2` package for `Table` because `Table` is more closely tied to the `TableCatalog` API than to the data source API. The link to DSv2 is that `Table` carries `newScanBuilder`, but the rest of the methods exposed by `Table` are for catalog functions, like inspecting a table's partitioning or table properties. Moving this class would make adding `TableCatalog` less intrusive.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org