I want to highlight in case I missed this in the original email:

The 4 API will not be deleted. They will just be marked as deprecated
annotations and we encourage users to use their alternatives.


-Rui

On Thu, Jul 7, 2022 at 2:23 PM Rui Wang <amaliu...@apache.org> wrote:

> Hi Community,
>
> Proposal:
> I want to discuss a proposal to deprecate the following Catalog API:
> def listColumns(dbName: String, tableName: String): Dataset[Column]
> def getTable(dbName: String, tableName: String): Table
> def getFunction(dbName: String, functionName: String): Function
> def tableExists(dbName: String, tableName: String): Boolean
>
>
> Context:
> We have been adding table identifier with catalog name (aka 3 layer
> namespace) support to Catalog API in
> https://issues.apache.org/jira/browse/SPARK-39235.
> The basic idea is, if an API accepts:
> 1. only tableName:String, we allow it accepts "a.b.c" and
> goes analyzer which treats a as catalog name, b namespace name and c table
> name.
> 2. only dbName:String, we allow it accepts "a.b" and goes analyzer which
> treats a as catalog name, b namespace name.
> Meanwhile we still maintain the backwards compatibility for such API to
> make sure past behavior remains the same. E.g. If you only use tableName it
> is still recognized by the session catalog.
>
> With this effort ongoing, the above 4 API becomes not fully
> compatible with the 3 layer namespace.
>
> use tableExists(dbName: String, tableName: String) as an example, given
> that it takes two parameters but leaves no room for the extra catalog name.
> Also if we want to reuse the two parameters, which one will be the one that
> takes more than one name part?
>
>
> How?
> So how to improve the above 4 API? There are two options:
> a. Expand those four API to let those API accept catalog names. For
> example, tableExists(catalogName: String, dbName: String, tableName:
> String).
> b. mark those API as `deprecated`.
>
> I am proposing to follow option B which does API deprecation.
>
> Why?
> 1. Reduce unneeded API. The existing API can support the same behavior
> given SPARK-39235. For example, tableExists(dbName, tableName) can be
> replaced to use tableExists("dbName.tableName").
> 2. Reduce incomplete API. The proposed API to deprecate does not support 3
> layer namespace now, and it is hard to do so (where to take 3 part names)?
> 3. Deprecation suggests users to migrate their usage on API.
> 4. There was existing practice that we deprecated CreateExternalTable API
> when adding CreateTable API:
> https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220
>
>
> What do you think?
>
> Thanks,
> Rui Wang
>
>
>

Reply via email to