It's better to keep all APIs working. But in this case, I really have no idea how to make these 4 APIs reasonable. For example, tableExists(dbName: String, tableName: String) currently checks if table "dbName.tableName" exists in the Hive metastore, and does not work with v2 catalogs at all. It's not only a "not needed" API, but also a confusing API. We need a mechanism to move users away from confusing APIs.
I agree that we should not abuse deprecation. I think a general principle to use deprecation is you have the intention to remove it eventually, which is exactly the case here. We should remove these 4 APIs when most users have moved away. Thanks, Wenchen On Fri, Jul 8, 2022 at 2:49 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Thank you for starting the official discussion, Rui. > > 'Unneeded API' doesn't sound like a good frame for this discussion > because it ignores the existing users and codes completely. > Technically, the above mentioned reasons look irrelevant to any > specific existing bugs or future maintenance cost saving. Instead, the > deprecation already causes costs to the community (your PR, the future > migration guide, and the communication with the customers like Q&A) > and to the users for the actual migration to new API and validations. > Given that, for now, the goal of this proposal looks like a pure > educational purpose to advertise new APIs to Apache Spark 3.4+ users. > > Can we be more conservative at Apache Spark deprecation and allow > users to use both APIs freely without any concern of uncertain > insupportability? I simply want to avoid the situation where the pure > educational deprecation itself becomes `Unneeded Deprecation` in the > community. > > Dongjoon. > > On Thu, Jul 7, 2022 at 2:26 PM Rui Wang <amaliu...@apache.org> wrote: > > > > I want to highlight in case I missed this in the original email: > > > > The 4 API will not be deleted. They will just be marked as deprecated > annotations and we encourage users to use their alternatives. > > > > > > -Rui > > > > On Thu, Jul 7, 2022 at 2:23 PM Rui Wang <amaliu...@apache.org> wrote: > >> > >> Hi Community, > >> > >> Proposal: > >> I want to discuss a proposal to deprecate the following Catalog API: > >> def listColumns(dbName: String, tableName: String): Dataset[Column] > >> def getTable(dbName: String, tableName: String): Table > >> def getFunction(dbName: String, functionName: String): Function > >> def tableExists(dbName: String, tableName: String): Boolean > >> > >> > >> Context: > >> We have been adding table identifier with catalog name (aka 3 layer > namespace) support to Catalog API in > https://issues.apache.org/jira/browse/SPARK-39235. > >> The basic idea is, if an API accepts: > >> 1. only tableName:String, we allow it accepts "a.b.c" and goes analyzer > which treats a as catalog name, b namespace name and c table name. > >> 2. only dbName:String, we allow it accepts "a.b" and goes analyzer > which treats a as catalog name, b namespace name. > >> Meanwhile we still maintain the backwards compatibility for such API to > make sure past behavior remains the same. E.g. If you only use tableName it > is still recognized by the session catalog. > >> > >> With this effort ongoing, the above 4 API becomes not fully compatible > with the 3 layer namespace. > >> > >> use tableExists(dbName: String, tableName: String) as an example, given > that it takes two parameters but leaves no room for the extra catalog name. > Also if we want to reuse the two parameters, which one will be the one that > takes more than one name part? > >> > >> > >> How? > >> So how to improve the above 4 API? There are two options: > >> a. Expand those four API to let those API accept catalog names. For > example, tableExists(catalogName: String, dbName: String, tableName: > String). > >> b. mark those API as `deprecated`. > >> > >> I am proposing to follow option B which does API deprecation. > >> > >> Why? > >> 1. Reduce unneeded API. The existing API can support the same behavior > given SPARK-39235. For example, tableExists(dbName, tableName) can be > replaced to use tableExists("dbName.tableName"). > >> 2. Reduce incomplete API. The proposed API to deprecate does not > support 3 layer namespace now, and it is hard to do so (where to take 3 > part names)? > >> 3. Deprecation suggests users to migrate their usage on API. > >> 4. There was existing practice that we deprecated CreateExternalTable > API when adding CreateTable API: > https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220 > >> > >> > >> What do you think? > >> > >> Thanks, > >> Rui Wang > >> > >> > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >