On 2024-12-18 09:41:13 -0500, Shaomei Liu wrote: > if you happen to have an example to show the new behavior is more > "correct",
I haven't been on any of the main Perl mailing-lists or newsgroups for a
long time, so this may be outdated, but the general idea is that the
dichotomy between byte strings and character strings was a mistake and
that two strings which compare equal should hehave the same whenever
possible. The difference is just too subtle and error-prone.
In particular, the string you created in your test script was a byte
string with three bytes ("\xe2\x80\x9C"). That string has length 3 and
it will compare equal to the string with the three characters
U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX, U+0080 PADDING CHARACTER,
U+009C STRING TERMINATOR. So it stands to reason that it should be
treated the same as that 3 character string, and the varchar stored in
the database should also be 3 characters long and not just a single
character, just because it happens to be a byte sequence which happens
to match that character's UTF-8 encoding.
> On Wed, Dec 18, 2024 at 8:53 AM Felipe Gasper <[email protected]> wrote:
> >
> > Do we know, in fact, why this changed?
> >
> > The new behaviour may be “more correct”, but it’ll still subtly
> > break a bunch of stuff that worked fine before.
True.
But it should probably also be noted that Redhat 7 was released in 2014 and
Redhat 8 in 2019. So the "new behaviour" is now between 5 and 10 years
old.
I'm too lazy to track down the release which introduced the change
(especially since there seems to be a huge gap in the history on CPAN),
but I would expect that to be mentioned in the release notes at the
time.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | [email protected] | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature
