On 2024-12-18 09:41:13 -0500, Shaomei Liu wrote: > if you happen to have an example to show the new behavior is more > "correct",
I haven't been on any of the main Perl mailing-lists or newsgroups for a long time, so this may be outdated, but the general idea is that the dichotomy between byte strings and character strings was a mistake and that two strings which compare equal should hehave the same whenever possible. The difference is just too subtle and error-prone. In particular, the string you created in your test script was a byte string with three bytes ("\xe2\x80\x9C"). That string has length 3 and it will compare equal to the string with the three characters U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX, U+0080 PADDING CHARACTER, U+009C STRING TERMINATOR. So it stands to reason that it should be treated the same as that 3 character string, and the varchar stored in the database should also be 3 characters long and not just a single character, just because it happens to be a byte sequence which happens to match that character's UTF-8 encoding. > On Wed, Dec 18, 2024 at 8:53 AM Felipe Gasper <fel...@felipegasper.com> wrote: > > > > Do we know, in fact, why this changed? > > > > The new behaviour may be “more correct”, but it’ll still subtly > > break a bunch of stuff that worked fine before. True. But it should probably also be noted that Redhat 7 was released in 2014 and Redhat 8 in 2019. So the "new behaviour" is now between 5 and 10 years old. I'm too lazy to track down the release which introduced the change (especially since there seems to be a huge gap in the history on CPAN), but I would expect that to be mentioned in the release notes at the time. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | h...@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature