Hi all,
As I see this thread has consensus, here is the issue and PR to disable the
legacy behavior by default:
https://issues.apache.org/jira/browse/FLINK-26551
https://github.com/apache/flink/pull/19020

We target to merge it before the release of 1.15, unless there are any
objections.

FG

On Tue, Mar 1, 2022 at 2:18 PM Marios Trivyzas <mat...@gmail.com> wrote:

> Indeed, if we manage to use the configuration from *flink-conf.yaml* down
> the stack,
> it would be easy for everyone to configure a "system-wide" legacy cast
> behaviour.
>
> Best regards,
> Marios
>
> On Tue, Mar 1, 2022 at 2:52 PM Timo Walther <twal...@apache.org> wrote:
>
> > +1
> >
> > Thanks for bringing up this discussion one more time Marios.
> >
> > I strongly support enabling the new behavior in 1.15. It definitely has
> > implications on existing users, but as Seth said, thinking about the
> > upcoming upgrade story we need to make sure that at least the core/basic
> > operations are correct. Otherwise we will have to maintain multiple
> > versions of functions with broken semantics.
> >
> > I since we also try to fix various issues around configuration, maybe it
> > might still be possible to configure the legacy cast behavior globally
> > via flink-conf.yaml. This should make the transitioning period easier in
> > production.
> >
> > Regards,
> > Timo
> >
> > Am 28.02.22 um 19:04 schrieb Seth Wiesman:
> > > +1
> > >
> > > Especially as SQL upgrades are right around the corner, it makes sense
> to
> > > get our defaults right.
> > >
> > > Seth
> > >
> > > On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <mart...@ververica.com>
> > > wrote:
> > >
> > >> +1 for setting this option to disabled by default. I believe failures
> > >> should be brought forward as soon as possible, so they can be fixed as
> > fast
> > >> as possible. It will also be less confusing for new users. Last but
> not
> > >> least, I believe the impact on existing users will be minimal (since
> it
> > can
> > >> be changed by changing one flag).
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <mat...@gmail.com>
> wrote:
> > >>
> > >>> Thanks Francesco,
> > >>>
> > >>> The two arguments you posted, further strengthen the need to make it
> > >>> DISABLED by default.
> > >>>
> > >>> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
> > >>> france...@ververica.com> wrote:
> > >>>
> > >>>> Hi all,
> > >>>> I'm +1 with what everything you said Marios.
> > >>>>
> > >>>> I'm gonna add another argument on top of that: the
> > >> "legacy-cast-behavior"
> > >>>> has also a broken type inference, leading to incorrect results or
> > >> further
> > >>>> errors down in the pipeline[1]. For example, take this:
> > >>>>
> > >>>> SELECT COALESCE(CAST('a' AS INT), 0) ...
> > >>>>
> > >>>> With the legacy cast behavior ENABLED, this is going to lead to the
> > >> wrong
> > >>>> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return
> value
> > >> is
> > >>>> inferred as INT NOT NULL, so the planner will drop COALESCE, and
> will
> > >>> never
> > >>>> return 0. Essentially, CAST will infer the wrong nullability leading
> > to
> > >>>> assigning its result to a NOT NULL type, when its value can
> > effectively
> > >>> be
> > >>>> NULL.
> > >>>>
> > >>>>> You introduce a deprecated flag to help users
> > >>>> using old versions of the software to smoothly transition to the new
> > >>>> version, while the new users experience the new features/behavior,
> > >>>> without the need to set a flag.
> > >>>>
> > >>>> This is IMO the major point on why we should disable the legacy cast
> > >>>> behavior by default. This is even more relevant with 1.15 and the
> > >> upgrade
> > >>>> story, as per the problem described above, because users will now
> end
> > >> up
> > >>>> generating plans with wrong type inference, which will be hard to
> > >> migrate
> > >>>> in the next flink versions.
> > >>>>
> > >>>> FG
> > >>>>
> > >>>> [1] In case you're wondering why it wasn't fixed, the reason is that
> > >>> fixing
> > >>>> it means creating a breaking change, for details
> > >>>> https://github.com/apache/flink/pull/18611#issuecomment-1028174877
> > >>>>
> > >>>>
> > >>>> On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <mat...@gmail.com>
> > >>> wrote:
> > >>>>> Hello all!
> > >>>>>
> > >>>>> I would like to bring back the discussion regarding the
> > >>>>> *table.exec.legacy-cast-behaviour*
> > >>>>> configuration option which we are introducing with Flink *1.15*.
> This
> > >>>>> option provides the users
> > >>>>> with the flexibility to continue using the old (incorrect,
> according
> > >> to
> > >>>> SQL
> > >>>>> standards) behaviour
> > >>>>> of *CAST.*
> > >>>>>
> > >>>>> With Flink *1.15* we have introduced a bunch of fixes, improvements
> > >> and
> > >>>> new
> > >>>>> casting functionality
> > >>>>> between types, see
> https://issues.apache.org/jira/browse/FLINK-24403
> > >> ,
> > >>>> and
> > >>>>> some of them are
> > >>>>> guarded behind the legacy behaviour option:
> > >>>>>
> > >>>>>     - Trimming and padding when casting to CHAR/VARCHAR types to
> > >> respect
> > >>>> the
> > >>>>>     specified length
> > >>>>>     - Changes for casting various types to CHAR/VARCHAR/STRING
> > >>>>>     - Runtime errors for CAST no longer emit *null *as result but
> > >>>> exceptions
> > >>>>>     are thrown with a meaningful message for the cause, that fail
> the
> > >>>>> pipeline. *TRY_CAST
> > >>>>>     *is introduced instead, which emits *null* results instead of
> > >>> throwing
> > >>>>>     exceptions.
> > >>>>>
> > >>>>> Those changes become active if users set the
> > >>>>> *table.exec.legacy-cast-behaviour
> > >>>>> *option to *DISABLED*, otherwise
> > >>>>> they will continue to experience the old, *erroneous*, behaviour of
> > >>>> *CAST*.
> > >>>>> Currently, we have set the *table.exec.legacy-cast-behaviour
> *option
> > >>>>> to be *ENABLED
> > >>>>> *by default, so if users want
> > >>>>> to get the new correct behaviour, they are required to set
> explicitly
> > >>> the
> > >>>>> option to *DISABLED*. Moreover, the option
> > >>>>> itself is marked as deprecated, since the plan is to be removed in
> > >> the
> > >>>>> future, so that the old, erroneous behaviour
> > >>>>> won't be an option, and the *TRY_CAST* would be the way to go if
> > >> users
> > >>>>> don't want to have errors and failed pipelines,
> > >>>>> but have *null*s emitted in case of runtime errors when casting.
> > >>>>>
> > >>>>> I would like to start a discussion and maybe ask for voting, so
> that
> > >> we
> > >>>> set
> > >>>>> the *table.exec.legacy-cast-behaviour* option
> > >>>>> to *DISABLED *by default, so that users that want to keep their old
> > >>>>> pipelines working the same way, without changing their
> > >>>>> SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
> > >>>>>
> > >>>>> My main argument for changing the default value for the option, is
> > >> that
> > >>>> the
> > >>>>> *DISABLED* value is the one that enables
> > >>>>> the *correct* behaviour for CAST which should be the default for
> all
> > >>> new
> > >>>>> users. This way, new FLINK users, or users which
> > >>>>> build new pipelines, from now on would get the correct behaviour by
> > >>>> default
> > >>>>> without the need of changing some flag in their
> > >>>>> configuration. It feels weird to me, especially for people very
> > >>> familiar
> > >>>>> with standard SQL, to be obliged to set some config
> > >>>>> flag, to be able to get the correct behaviour for CAST. On top,
> users
> > >>>> that
> > >>>>> won't read about this option in our docs, will,
> > >>>>> "blindly", experience the old incorrect behaviour for their new
> > >>>> pipelines,
> > >>>>> and issues that could cause the CAST to fail
> > >>>>> will remain hidden from them, since *nulls* would be emitted. IMHO,
> > >>> this
> > >>>>> last part is also very important during the development
> > >>>>> stages of an application/pipeline. Normally, developers would want
> to
> > >>> see
> > >>>>> all possible errors/scenarios during development
> > >>>>> stages , in order to build a robust production system. If errors,
> are
> > >>>>> hidden then, they can easily end up with those errors
> > >>>>> in the production system which would be even harder to discover and
> > >>>> debug,
> > >>>>> since no exception will ever be thrown.
> > >>>>> Imagine that there is a CAST which generates nulls because of
> runtime
> > >>>>> errors, and its result is used in an aggregate function:
> > >>>>>
> > >>>>> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
> > >>>>>
> > >>>>> If nulls are emitted by the CAST (let's say because some records
> > >> have a
> > >>>>> quoted string value for col1, then simply the AVG result
> > >>>>> would be wrong and in would be extremely difficult to realise and
> fix
> > >>> the
> > >>>>> issue by simply wrapping col1 first with, for example,
> > >>>>> a REGEXP_REPLACE, to get rid of the quotes.
> > >>>>>
> > >>>>> My second argument is that, since we have marked
> > >>>>> *table.exec.legacy-cast-behaviour* as deprecated, and we want to
> > >>>> completely
> > >>>>> remove it in the future, if the default value is *DISABLED*, when
> > >> it's
> > >>>>> removed we also make a breaking change, by changing the
> > >>>>> default behaviour for all users, which is against the common
> software
> > >>>>> practices. You introduce a deprecated flag to help users
> > >>>>> using old versions of the software to smoothly transition to the
> new
> > >>>>> version, while the new users experience the new features/behaviour,
> > >>>>> without the need to set a flag. So, when in the future this flag is
> > >>>>> completely removed, for those "new" users it would be completely
> > >>>>> transparent. If we continue with having the default value
> *ENABLED*,
> > >>>> those
> > >>>>> new user would experience an "unnatural" breaking change
> > >>>>> when this option is completely removed.
> > >>>>>
> > >>>>> Best,
> > >>>>> Marios
> > >>>>>
> > >>>
> > >>> --
> > >>> Marios
> > >>>
> >
> >
>
> --
> Marios
>

Reply via email to