There were some extra discussions that happened at https://github.com/apache/spark/pull/37105.
As of now we agreed to have a "soft deprecation" that 1. document the limitation of four API and suggest to use alternatives in the API doc 2. do not use the @deprecation annotation. Please let us know if you don't agree. -Rui On Fri, Jul 8, 2022 at 11:18 AM Rui Wang <amaliu...@apache.org> wrote: > Yes. The current goal is a pure educational deprecation. > > So given the proposal: > 1. existing users or users who do not care about catalog names in table > identifiers can still use all the API that maintain their past behavior. > 2. new users who intend to use table identifiers with catalog names > get warned by the annotation (and maybe a bit more comments on the API > surface) that 4 API will not serve their usage. > > I believe this proposal is conservative: do not intend to cause troubles > for existing users; do not intend to force user migration; do not intend to > delete APIs; do not intend to hurt supportability. If there is anything > that we can make this goal clear, I can do it. > > Ultimately, the 4 API in this thread has the problem that it is not > compatible with 3 layer namespace thus not the same as other API who is > supporting 3 layer namespace. For people who want to include catalog names, > the problem itself will stand and we probably have to do something about > it. > > -Rui > > On Fri, Jul 8, 2022 at 7:24 AM Wenchen Fan <cloud0...@gmail.com> wrote: > >> It's better to keep all APIs working. But in this case, I really have no >> idea how to make these 4 APIs reasonable. For example, tableExists(dbName: >> String, tableName: String) currently checks if table "dbName.tableName" >> exists in the Hive metastore, and does not work with v2 catalogs at all. >> It's not only a "not needed" API, but also a confusing API. We need a >> mechanism to move users away from confusing APIs. >> >> I agree that we should not abuse deprecation. I think a general principle >> to use deprecation is you have the intention to remove it eventually, which >> is exactly the case here. We should remove these 4 APIs when most users >> have moved away. >> >> Thanks, >> Wenchen >> >> On Fri, Jul 8, 2022 at 2:49 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Thank you for starting the official discussion, Rui. >>> >>> 'Unneeded API' doesn't sound like a good frame for this discussion >>> because it ignores the existing users and codes completely. >>> Technically, the above mentioned reasons look irrelevant to any >>> specific existing bugs or future maintenance cost saving. Instead, the >>> deprecation already causes costs to the community (your PR, the future >>> migration guide, and the communication with the customers like Q&A) >>> and to the users for the actual migration to new API and validations. >>> Given that, for now, the goal of this proposal looks like a pure >>> educational purpose to advertise new APIs to Apache Spark 3.4+ users. >>> >>> Can we be more conservative at Apache Spark deprecation and allow >>> users to use both APIs freely without any concern of uncertain >>> insupportability? I simply want to avoid the situation where the pure >>> educational deprecation itself becomes `Unneeded Deprecation` in the >>> community. >>> >>> Dongjoon. >>> >>> On Thu, Jul 7, 2022 at 2:26 PM Rui Wang <amaliu...@apache.org> wrote: >>> > >>> > I want to highlight in case I missed this in the original email: >>> > >>> > The 4 API will not be deleted. They will just be marked as deprecated >>> annotations and we encourage users to use their alternatives. >>> > >>> > >>> > -Rui >>> > >>> > On Thu, Jul 7, 2022 at 2:23 PM Rui Wang <amaliu...@apache.org> wrote: >>> >> >>> >> Hi Community, >>> >> >>> >> Proposal: >>> >> I want to discuss a proposal to deprecate the following Catalog API: >>> >> def listColumns(dbName: String, tableName: String): Dataset[Column] >>> >> def getTable(dbName: String, tableName: String): Table >>> >> def getFunction(dbName: String, functionName: String): Function >>> >> def tableExists(dbName: String, tableName: String): Boolean >>> >> >>> >> >>> >> Context: >>> >> We have been adding table identifier with catalog name (aka 3 layer >>> namespace) support to Catalog API in >>> https://issues.apache.org/jira/browse/SPARK-39235. >>> >> The basic idea is, if an API accepts: >>> >> 1. only tableName:String, we allow it accepts "a.b.c" and goes >>> analyzer which treats a as catalog name, b namespace name and c table name. >>> >> 2. only dbName:String, we allow it accepts "a.b" and goes analyzer >>> which treats a as catalog name, b namespace name. >>> >> Meanwhile we still maintain the backwards compatibility for such API >>> to make sure past behavior remains the same. E.g. If you only use tableName >>> it is still recognized by the session catalog. >>> >> >>> >> With this effort ongoing, the above 4 API becomes not fully >>> compatible with the 3 layer namespace. >>> >> >>> >> use tableExists(dbName: String, tableName: String) as an example, >>> given that it takes two parameters but leaves no room for the extra catalog >>> name. Also if we want to reuse the two parameters, which one will be the >>> one that takes more than one name part? >>> >> >>> >> >>> >> How? >>> >> So how to improve the above 4 API? There are two options: >>> >> a. Expand those four API to let those API accept catalog names. For >>> example, tableExists(catalogName: String, dbName: String, tableName: >>> String). >>> >> b. mark those API as `deprecated`. >>> >> >>> >> I am proposing to follow option B which does API deprecation. >>> >> >>> >> Why? >>> >> 1. Reduce unneeded API. The existing API can support the same >>> behavior given SPARK-39235. For example, tableExists(dbName, tableName) can >>> be replaced to use tableExists("dbName.tableName"). >>> >> 2. Reduce incomplete API. The proposed API to deprecate does not >>> support 3 layer namespace now, and it is hard to do so (where to take 3 >>> part names)? >>> >> 3. Deprecation suggests users to migrate their usage on API. >>> >> 4. There was existing practice that we deprecated CreateExternalTable >>> API when adding CreateTable API: >>> https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220 >>> >> >>> >> >>> >> What do you think? >>> >> >>> >> Thanks, >>> >> Rui Wang >>> >> >>> >> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>>