Hi Community,

Proposal:
I want to discuss a proposal to deprecate the following Catalog API:
def listColumns(dbName: String, tableName: String): Dataset[Column]
def getTable(dbName: String, tableName: String): Table
def getFunction(dbName: String, functionName: String): Function
def tableExists(dbName: String, tableName: String): Boolean


Context:
We have been adding table identifier with catalog name (aka 3 layer
namespace) support to Catalog API in
https://issues.apache.org/jira/browse/SPARK-39235.
The basic idea is, if an API accepts:
1. only tableName:String, we allow it accepts "a.b.c" and
goes analyzer which treats a as catalog name, b namespace name and c table
name.
2. only dbName:String, we allow it accepts "a.b" and goes analyzer which
treats a as catalog name, b namespace name.
Meanwhile we still maintain the backwards compatibility for such API to
make sure past behavior remains the same. E.g. If you only use tableName it
is still recognized by the session catalog.

With this effort ongoing, the above 4 API becomes not fully compatible with
the 3 layer namespace.

use tableExists(dbName: String, tableName: String) as an example, given
that it takes two parameters but leaves no room for the extra catalog name.
Also if we want to reuse the two parameters, which one will be the one that
takes more than one name part?


How?
So how to improve the above 4 API? There are two options:
a. Expand those four API to let those API accept catalog names. For
example, tableExists(catalogName: String, dbName: String, tableName:
String).
b. mark those API as `deprecated`.

I am proposing to follow option B which does API deprecation.

Why?
1. Reduce unneeded API. The existing API can support the same behavior
given SPARK-39235. For example, tableExists(dbName, tableName) can be
replaced to use tableExists("dbName.tableName").
2. Reduce incomplete API. The proposed API to deprecate does not support 3
layer namespace now, and it is hard to do so (where to take 3 part names)?
3. Deprecation suggests users to migrate their usage on API.
4. There was existing practice that we deprecated CreateExternalTable API
when adding CreateTable API:
https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220


What do you think?

Thanks,
Rui Wang

Reply via email to