Yes, right. We need to choose the best way case by case. The following level of `Runtime warning` also sounds reasonable to me.
> Maybe we should only log the warning once per Spark application. Bests, Dongjoon. On Tue, Feb 18, 2020 at 1:13 AM Wenchen Fan <cloud0...@gmail.com> wrote: > I don't know what's the best way to deprecate an SQL function. Runtime > warning can be annoying if it keeps coming out. Maybe we should only log > the warning once per Spark application. > > On Tue, Feb 18, 2020 at 3:45 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Thank you for feedback, Wenchen, Maxim, and Takeshi. >> >> 1. "we would also promote the SQL standard TRIM syntax" >> 2. "we could output a warning with a notice about the order of >> parameters" >> 3. "it looks nice to make these (two parameters) trim functions >> unrecommended in future releases" >> >> Yes, in case of reverting SPARK-28093, we had better deprecate these >> two-parameter SQL function invocations. It's because this is really >> esoteric and even inconsistent inside Spark (Scala API also follows >> PostgresSQL/Presto style like the following.) >> >> def trim(e: Column, trimString: String) >> >> If we keep this situation in 3.0.0 release (a major release), it means >> Apache Spark will be forever. >> And, it becomes worse and worse because it leads more users to fall into >> this trap. >> >> There are a few deprecation ways. I believe (3) can be a proper next step >> in case of reverting because (2) is infeasible and (1) is considered >> insufficient because it's silent when we do SPARK-28093. We need non-silent >> (noisy) one in this case. Technically, (3) can be done in >> `Analyzer.ResolveFunctions`. >> >> 1. Documentation-only: Removing example and add migration guide >> 2. Compile-time warning by annotation: This is not an option for SQL >> function in SQL string. >> 3. Runtime warning with a directional guide >> (log.warn("... USE TRIM(trimStr FROM str) INSTEAD") >> >> How do you think about (3)? >> >> Bests, >> Dongjoon. >> >> On Sun, Feb 16, 2020 at 1:22 AM Takeshi Yamamuro <linguin....@gmail.com> >> wrote: >> >>> The revert looks fine to me for keeping the compatibility. >>> Also, I think the different orders between the systems easily lead to >>> mistakes, so >>> , as Wenchen suggested, it looks nice to make these (two parameters) >>> trim functions >>> unrecommended in future releases: >>> https://github.com/apache/spark/pull/27540#discussion_r377682518 >>> Actually, I think the SQL TRIM syntax is enough for trim use cases... >>> >>> Bests, >>> Takeshi >>> >>> >>> On Sun, Feb 16, 2020 at 3:02 AM Maxim Gekk <maxim.g...@databricks.com> >>> wrote: >>> >>>> Also if we look at possible combination of trim parameters: >>>> 1. foldable srcStr + foldable trimStr >>>> 2. foldable srcStr + non-foldable trimStr >>>> 3. non-foldable srcStr + foldable trimStr >>>> 4. non-foldable srcStr + non-foldable trimStr >>>> >>>> The case # 2 seems a rare case, and # 3 is probably the most common >>>> case. Once we see the second case, we could output a warning with a notice >>>> about the order of parameters. >>>> >>>> Maxim Gekk >>>> >>>> Software Engineer >>>> >>>> Databricks, Inc. >>>> >>>> >>>> On Sat, Feb 15, 2020 at 5:04 PM Wenchen Fan <cloud0...@gmail.com> >>>> wrote: >>>> >>>>> It's unfortunate that we don't have a clear document to talk about >>>>> breaking changes (I'm working on it BTW). I believe the general guidance >>>>> is: *avoid breaking changes unless we have to*. For example, the >>>>> previous result was so broken that we have to fix it, moving to Scala 2.12 >>>>> makes us have to break some APIs, etc. >>>>> >>>>> For this particular case, do we have to switch the parameter order? >>>>> It's different from some systems, the order was not decided explicitly, >>>>> but >>>>> I don't think they are strong reasons. People from RDBMS should use the >>>>> SQL >>>>> standard TRIM syntax more often. People using prior Spark versions should >>>>> have figured out the parameter order of Spark TRIM (there was no document) >>>>> and adjusted their queries. There is no such standard that defines the >>>>> parameter order of the TRIM function. >>>>> >>>>> In the long term, we would also promote the SQL standard TRIM syntax. >>>>> I don't see much benefit of "fixing" the parameter order that worth to >>>>> make >>>>> a breaking change. >>>>> >>>>> Thanks, >>>>> Wenchen >>>>> >>>>> >>>>> >>>>> On Sat, Feb 15, 2020 at 3:44 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>>> wrote: >>>>> >>>>>> Please note that the context if TRIM/LTRIM/RTRIM with two-parameters >>>>>> and TRIM(trimStr FROM str) syntax. >>>>>> >>>>>> This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM. >>>>>> >>>>>> On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun < >>>>>> dongjoon.h...@gmail.com> wrote: >>>>>> >>>>>>> Hi, All. >>>>>>> >>>>>>> I'm sending this email because the Apache Spark committers had >>>>>>> better have a consistent point of views for the upcoming PRs. And, the >>>>>>> community policy is the way to lead the community members transparently >>>>>>> and >>>>>>> clearly for a long term good. >>>>>>> >>>>>>> First of all, I want to emphasize that, like Apache Spark 2.0.0, >>>>>>> Apache Spark 3.0.0 is going to achieve many improvement in SQL areas. >>>>>>> >>>>>>> Especially, we invested lots of efforts to improve SQL ANSI >>>>>>> compliance and related test coverage for the future. One simple example >>>>>>> is >>>>>>> the following. >>>>>>> >>>>>>> [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax >>>>>>> >>>>>>> With this support, we are able to move away from one of esoteric >>>>>>> Spark function `TRIM/LTRIM/RTRIM`. >>>>>>> (Note that the above syntax is ANSI standard, but we have additional >>>>>>> non-standard these functions, too.) >>>>>>> >>>>>>> Here is the summary of the current status. >>>>>>> >>>>>>> +------------------------+-------------+---------------+ >>>>>>> | SQL Progressing Engine | TRIM Syntax | TRIM Function | >>>>>>> +------------------------------------------------------+ >>>>>>> | Spark 3.0.0-preview/2 | O | O | >>>>>>> | PostgreSQL | O | O | >>>>>>> | Presto | O | O | >>>>>>> | MySQL | O(*) | X | >>>>>>> | Oracle | O | X | >>>>>>> | MsSQL | O | X | >>>>>>> | Terradata | O | X | >>>>>>> | Hive | X | X | >>>>>>> | Spark 1.6 ~ 2.2 | X | X | >>>>>>> | Spark 2.3 ~ 2.4 | X | O(**) | >>>>>>> +------------------------+-------------+---------------+ >>>>>>> * MySQL doesn't allow multiple trim character >>>>>>> * Spark 2.3 ~ 2.4 have the function in a different way. >>>>>>> >>>>>>> Here is the illustrative example of the problem. >>>>>>> >>>>>>> postgres=# SELECT trim('yxTomxx', 'xyz'); >>>>>>> btrim >>>>>>> ------- >>>>>>> Tom >>>>>>> >>>>>>> presto:default> SELECT trim('yxTomxx', 'xyz'); >>>>>>> _col0 >>>>>>> ------- >>>>>>> Tom >>>>>>> >>>>>>> spark-sql> SELECT trim('yxTomxx', 'xyz'); >>>>>>> z >>>>>>> >>>>>>> Here is our history to fix the above issue. >>>>>>> >>>>>>> [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order >>>>>>> issue >>>>>>> 1. https://github.com/apache/spark/pull/24902 >>>>>>> (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and >>>>>>> 3.0.0-preview2 released.) >>>>>>> 2. https://github.com/apache/spark/pull/24907 >>>>>>> (Merged 2019-06-20 for v2.3.4, but reverted) >>>>>>> 3. https://github.com/apache/spark/pull/24908 >>>>>>> (Merged 2019-06-21 for v2.4.4, but reverted) >>>>>>> >>>>>>> (2) and (3) were reverted before releases because we didn't want to >>>>>>> fix that in the maintenance releases. Please see the following >>>>>>> references >>>>>>> of the decision. >>>>>>> >>>>>>> >>>>>>> https://github.com/apache/spark/pull/24908#issuecomment-504799028 >>>>>>> (2.3) >>>>>>> >>>>>>> https://github.com/apache/spark/pull/24907#issuecomment-504799021 >>>>>>> (2.4) >>>>>>> >>>>>>> Now, there are some requests to revert SPARK-28093 and to keep these >>>>>>> esoteric functions for backward compatibility and the following reason. >>>>>>> >>>>>>> > Reordering function parameters to match another system, >>>>>>> > for a method that is otherwise working correctly, >>>>>>> > sounds exactly like a cosmetic change to me. >>>>>>> >>>>>>> > How can we silently change the parameter of an existing SQL >>>>>>> function? >>>>>>> > I don't think this is a correctness issue as the SQL standard >>>>>>> > doesn't say that the function signature have to be >>>>>>> trim(srcStr, trimStr). >>>>>>> >>>>>>> The concern and the point of views make sense. >>>>>>> >>>>>>> My concerns are the followings. >>>>>>> >>>>>>> 1. This kind of esoteric differences are called `vendor-lock-in` >>>>>>> stuffs in a negative way. >>>>>>> - It's difficult for new users to understand. >>>>>>> - It's hard to migrate between Apache Spark and the others. >>>>>>> 2. Although we did our bests, Apache Spark SQL has not been >>>>>>> enough always. >>>>>>> 3. We need to do improvement in the future releases. >>>>>>> >>>>>>> In short, we can keep 3.0.0-preview behaviors here or revert >>>>>>> SPARK-28093 in order to keep this vendor-lock-in stuffs for >>>>>>> backward-compatibility. >>>>>>> >>>>>>> What I want is building a consistent point of views in this category >>>>>>> for the upcoming PR reviews. >>>>>>> >>>>>>> If we have a clear policy, we can save future community efforts in >>>>>>> many ways. >>>>>>> >>>>>>> Bests, >>>>>>> Dongjoon. >>>>>>> >>>>>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >>