Hi Community, Proposal: I want to discuss a proposal to deprecate the following Catalog API: def listColumns(dbName: String, tableName: String): Dataset[Column] def getTable(dbName: String, tableName: String): Table def getFunction(dbName: String, functionName: String): Function def tableExists(dbName: String, tableName: String): Boolean
Context: We have been adding table identifier with catalog name (aka 3 layer namespace) support to Catalog API in https://issues.apache.org/jira/browse/SPARK-39235. The basic idea is, if an API accepts: 1. only tableName:String, we allow it accepts "a.b.c" and goes analyzer which treats a as catalog name, b namespace name and c table name. 2. only dbName:String, we allow it accepts "a.b" and goes analyzer which treats a as catalog name, b namespace name. Meanwhile we still maintain the backwards compatibility for such API to make sure past behavior remains the same. E.g. If you only use tableName it is still recognized by the session catalog. With this effort ongoing, the above 4 API becomes not fully compatible with the 3 layer namespace. use tableExists(dbName: String, tableName: String) as an example, given that it takes two parameters but leaves no room for the extra catalog name. Also if we want to reuse the two parameters, which one will be the one that takes more than one name part? How? So how to improve the above 4 API? There are two options: a. Expand those four API to let those API accept catalog names. For example, tableExists(catalogName: String, dbName: String, tableName: String). b. mark those API as `deprecated`. I am proposing to follow option B which does API deprecation. Why? 1. Reduce unneeded API. The existing API can support the same behavior given SPARK-39235. For example, tableExists(dbName, tableName) can be replaced to use tableExists("dbName.tableName"). 2. Reduce incomplete API. The proposed API to deprecate does not support 3 layer namespace now, and it is hard to do so (where to take 3 part names)? 3. Deprecation suggests users to migrate their usage on API. 4. There was existing practice that we deprecated CreateExternalTable API when adding CreateTable API: https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220 What do you think? Thanks, Rui Wang