It's better to keep all APIs working. But in this case, I really have no
idea how to make these 4 APIs reasonable. For example, tableExists(dbName:
String, tableName: String) currently checks if table "dbName.tableName"
exists in the Hive metastore, and does not work with v2 catalogs at all.
It's not only a "not needed" API, but also a confusing API. We need a
mechanism to move users away from confusing APIs.

I agree that we should not abuse deprecation. I think a general principle
to use deprecation is you have the intention to remove it eventually, which
is exactly the case here. We should remove these 4 APIs when most users
have moved away.

Thanks,
Wenchen

On Fri, Jul 8, 2022 at 2:49 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Thank you for starting the official discussion, Rui.
>
> 'Unneeded API' doesn't sound like a good frame for this discussion
> because it ignores the existing users and codes completely.
> Technically, the above mentioned reasons look irrelevant to any
> specific existing bugs or future maintenance cost saving. Instead, the
> deprecation already causes costs to the community (your PR, the future
> migration guide, and the communication with the customers like Q&A)
> and to the users for the actual migration to new API and validations.
> Given that, for now, the goal of this proposal looks like a pure
> educational purpose to advertise new APIs to Apache Spark 3.4+ users.
>
> Can we be more conservative at Apache Spark deprecation and allow
> users to use both APIs freely without any concern of uncertain
> insupportability? I simply want to avoid the situation where the pure
> educational deprecation itself becomes `Unneeded Deprecation` in the
> community.
>
> Dongjoon.
>
> On Thu, Jul 7, 2022 at 2:26 PM Rui Wang <amaliu...@apache.org> wrote:
> >
> > I want to highlight in case I missed this in the original email:
> >
> > The 4 API will not be deleted. They will just be marked as deprecated
> annotations and we encourage users to use their alternatives.
> >
> >
> > -Rui
> >
> > On Thu, Jul 7, 2022 at 2:23 PM Rui Wang <amaliu...@apache.org> wrote:
> >>
> >> Hi Community,
> >>
> >> Proposal:
> >> I want to discuss a proposal to deprecate the following Catalog API:
> >> def listColumns(dbName: String, tableName: String): Dataset[Column]
> >> def getTable(dbName: String, tableName: String): Table
> >> def getFunction(dbName: String, functionName: String): Function
> >> def tableExists(dbName: String, tableName: String): Boolean
> >>
> >>
> >> Context:
> >> We have been adding table identifier with catalog name (aka 3 layer
> namespace) support to Catalog API in
> https://issues.apache.org/jira/browse/SPARK-39235.
> >> The basic idea is, if an API accepts:
> >> 1. only tableName:String, we allow it accepts "a.b.c" and goes analyzer
> which treats a as catalog name, b namespace name and c table name.
> >> 2. only dbName:String, we allow it accepts "a.b" and goes analyzer
> which treats a as catalog name, b namespace name.
> >> Meanwhile we still maintain the backwards compatibility for such API to
> make sure past behavior remains the same. E.g. If you only use tableName it
> is still recognized by the session catalog.
> >>
> >> With this effort ongoing, the above 4 API becomes not fully compatible
> with the 3 layer namespace.
> >>
> >> use tableExists(dbName: String, tableName: String) as an example, given
> that it takes two parameters but leaves no room for the extra catalog name.
> Also if we want to reuse the two parameters, which one will be the one that
> takes more than one name part?
> >>
> >>
> >> How?
> >> So how to improve the above 4 API? There are two options:
> >> a. Expand those four API to let those API accept catalog names. For
> example, tableExists(catalogName: String, dbName: String, tableName:
> String).
> >> b. mark those API as `deprecated`.
> >>
> >> I am proposing to follow option B which does API deprecation.
> >>
> >> Why?
> >> 1. Reduce unneeded API. The existing API can support the same behavior
> given SPARK-39235. For example, tableExists(dbName, tableName) can be
> replaced to use tableExists("dbName.tableName").
> >> 2. Reduce incomplete API. The proposed API to deprecate does not
> support 3 layer namespace now, and it is hard to do so (where to take 3
> part names)?
> >> 3. Deprecation suggests users to migrate their usage on API.
> >> 4. There was existing practice that we deprecated CreateExternalTable
> API when adding CreateTable API:
> https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220
> >>
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Rui Wang
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to