Re: [DISCUSS][Catalog API] Deprecate 4 Catalog API that takes two parameters which are (dbName, tableName/functionName)

Rui Wang Fri, 08 Jul 2022 11:18:48 -0700

Yes. The current goal is a pure educational deprecation.

So given the proposal:
1. existing users or users who do not care about catalog names in table
identifiers can still use all the API that maintain their past behavior.
2. new users who intend to use table identifiers with catalog names
get warned by the annotation (and maybe a bit more comments on the API
surface) that 4 API will not serve their usage.


I believe this proposal is conservative: do not intend to cause troubles
for existing users; do not intend to force user migration; do not intend to
delete APIs; do not intend to hurt supportability. If there is anything
that we can make this goal clear, I can do it.

Ultimately, the 4 API in this thread has the problem that it is not
compatible with 3 layer namespace thus not the same as other API who is
supporting 3 layer namespace. For people who want to include catalog names,
the problem itself will stand and we probably have to do something about
it.

-Rui

On Fri, Jul 8, 2022 at 7:24 AM Wenchen Fan <cloud0...@gmail.com> wrote:

> It's better to keep all APIs working. But in this case, I really have no
> idea how to make these 4 APIs reasonable. For example, tableExists(dbName:
> String, tableName: String) currently checks if table "dbName.tableName"
> exists in the Hive metastore, and does not work with v2 catalogs at all.
> It's not only a "not needed" API, but also a confusing API. We need a
> mechanism to move users away from confusing APIs.
>
> I agree that we should not abuse deprecation. I think a general principle
> to use deprecation is you have the intention to remove it eventually, which
> is exactly the case here. We should remove these 4 APIs when most users
> have moved away.
>
> Thanks,
> Wenchen
>
> On Fri, Jul 8, 2022 at 2:49 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Thank you for starting the official discussion, Rui.
>>
>> 'Unneeded API' doesn't sound like a good frame for this discussion
>> because it ignores the existing users and codes completely.
>> Technically, the above mentioned reasons look irrelevant to any
>> specific existing bugs or future maintenance cost saving. Instead, the
>> deprecation already causes costs to the community (your PR, the future
>> migration guide, and the communication with the customers like Q&A)
>> and to the users for the actual migration to new API and validations.
>> Given that, for now, the goal of this proposal looks like a pure
>> educational purpose to advertise new APIs to Apache Spark 3.4+ users.
>>
>> Can we be more conservative at Apache Spark deprecation and allow
>> users to use both APIs freely without any concern of uncertain
>> insupportability? I simply want to avoid the situation where the pure
>> educational deprecation itself becomes `Unneeded Deprecation` in the
>> community.
>>
>> Dongjoon.
>>
>> On Thu, Jul 7, 2022 at 2:26 PM Rui Wang <amaliu...@apache.org> wrote:
>> >
>> > I want to highlight in case I missed this in the original email:
>> >
>> > The 4 API will not be deleted. They will just be marked as deprecated
>> annotations and we encourage users to use their alternatives.
>> >
>> >
>> > -Rui
>> >
>> > On Thu, Jul 7, 2022 at 2:23 PM Rui Wang <amaliu...@apache.org> wrote:
>> >>
>> >> Hi Community,
>> >>
>> >> Proposal:
>> >> I want to discuss a proposal to deprecate the following Catalog API:
>> >> def listColumns(dbName: String, tableName: String): Dataset[Column]
>> >> def getTable(dbName: String, tableName: String): Table
>> >> def getFunction(dbName: String, functionName: String): Function
>> >> def tableExists(dbName: String, tableName: String): Boolean
>> >>
>> >>
>> >> Context:
>> >> We have been adding table identifier with catalog name (aka 3 layer
>> namespace) support to Catalog API in
>> https://issues.apache.org/jira/browse/SPARK-39235.
>> >> The basic idea is, if an API accepts:
>> >> 1. only tableName:String, we allow it accepts "a.b.c" and goes
>> analyzer which treats a as catalog name, b namespace name and c table name.
>> >> 2. only dbName:String, we allow it accepts "a.b" and goes analyzer
>> which treats a as catalog name, b namespace name.
>> >> Meanwhile we still maintain the backwards compatibility for such API
>> to make sure past behavior remains the same. E.g. If you only use tableName
>> it is still recognized by the session catalog.
>> >>
>> >> With this effort ongoing, the above 4 API becomes not fully compatible
>> with the 3 layer namespace.
>> >>
>> >> use tableExists(dbName: String, tableName: String) as an example,
>> given that it takes two parameters but leaves no room for the extra catalog
>> name. Also if we want to reuse the two parameters, which one will be the
>> one that takes more than one name part?
>> >>
>> >>
>> >> How?
>> >> So how to improve the above 4 API? There are two options:
>> >> a. Expand those four API to let those API accept catalog names. For
>> example, tableExists(catalogName: String, dbName: String, tableName:
>> String).
>> >> b. mark those API as `deprecated`.
>> >>
>> >> I am proposing to follow option B which does API deprecation.
>> >>
>> >> Why?
>> >> 1. Reduce unneeded API. The existing API can support the same behavior
>> given SPARK-39235. For example, tableExists(dbName, tableName) can be
>> replaced to use tableExists("dbName.tableName").
>> >> 2. Reduce incomplete API. The proposed API to deprecate does not
>> support 3 layer namespace now, and it is hard to do so (where to take 3
>> part names)?
>> >> 3. Deprecation suggests users to migrate their usage on API.
>> >> 4. There was existing practice that we deprecated CreateExternalTable
>> API when adding CreateTable API:
>> https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220
>> >>
>> >>
>> >> What do you think?
>> >>
>> >> Thanks,
>> >> Rui Wang
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: [DISCUSS][Catalog API] Deprecate 4 Catalog API that takes two parameters which are (dbName, tableName/functionName)

Reply via email to