Do we know, in fact, why this changed? The new behaviour may be “more correct”, but it’ll still subtly break a bunch of stuff that worked fine before.
Encoding bugs in Perl are notoriously hard to track down. DBD::Pg is popular; it would be good to know exactly why this happened so that others could proactively adjust their code accordingly. Also, I recommend my Unicode/UTF-8 talk on this topic, particularly the “use utf8” section starting at about 9m30s and again around 20m20s: https://www.youtube.com/watch?v=yH5IyYyvWHU -FG > On Dec 18, 2024, at 12:12 AM, Dan Book <gri...@gmail.com> wrote: > > Indeed, how strings work has not changed, but DBD::Pg's interpretation of > your strings probably did; the new behavior is more "correct" and now that > you are sending it decoded Unicode characters you may avoid other mysterious > issues. (Note that DBI itself does not handle strings, it just provides the > interface, DBD::Pg defines how strings are send to and from the database) > > -Dan > > On Tue, Dec 17, 2024 at 11:09 PM Shaomei Liu <sliu.newjer...@gmail.com> wrote: > Dear Dan, Mark, Felipe, Alexander, > Thank you all for your valuable feedback! > as I replied Dan yesterday, this is my first time to ask for support from a > mailing list. I was very surprised and happy to get answers so quickly! > I added "use utf8;" as suggested by Dan and it worked for my test program > shown in the email, but not for project. > then I tried decode as suggested by Dan and it worked for both test program > and project. so issue solved for me!!! > perl version is 5.26.3 and 5.16.3 on EL8 and EL7 respectively. > DBI version is 1.641 and 1.627 on EL8 and EL7 respectively. > > here is the test program with decode. I also printed length. I thought it is > a perl thing. but the length is the same on EL8 and EL7. so not sure it is > perl or DBI change causing the issue. > xxx.com> cat testutf_decode.pl > #!/usr/bin/perl > use strict; > use warnings; > use DBI; > use Encode 'decode'; > print "DBI version: $DBI::VERSION\n"; > > my $db = "debugutf"; > my $host = "db"; > my $user = "postgres"; > my $pass = ""; > my $dbh = DBI->connect("DBI:Pg:dbname=$db;host=$host",$user,$pass); > my $sql = 'INSERT INTO table1 (title) VALUES (?)'; > my $query = $dbh->prepare($sql); > my $bytes = '“'; > my $chars = decode('UTF-8', $bytes); > print "$bytes contains ".length($bytes)." characters\n"; > print "after decode $bytes contains ".length($chars)." characters\n"; > #my @values = ($bytes); #=======>with this line, Database shows “ on EL7 but > â\u0080\u009C on EL8 > my @values = ($chars); #======>Database shows “ on both EL8 and EL7, so > decode fixed the issue > $query->execute(@values); > > xxx.com> ./testutf_decode.pl #running on EL8 > DBI version: 1.641 > “ contains 3 characters > after decode “ contains 1 characters > > xxx.com> ./testutf_decode.pl #running on EL7 > DBI version: 1.627 > “ contains 3 characters > after decode “ contains 1 characters > > Thank you!! > Shirley > > On Tue, Dec 17, 2024 at 3:30 PM Alexander Foken via dbi-users > <dbi-users@perl.org> wrote: > Hi, > DBD::ODBC has several tests related to Unicode handling > (40UnicodeRoundTrip.t, 41Unicode.t, 45_unicode_varchar.t), they should also > work with other DBDs. They should tell you if your problem is between Perl > and Postgres or if it is simply in the encoding of your terminal. > Alexander > On 17.12.2024 13:31, Felipe Gasper via dbi-users wrote: >> Respectfully to Dan & others, I don’t advocate adding “use utf8” to existing >> code without a clear understanding of where your program’s decode & encode >> points are. >> >> Check to see what DBD::Pg actually writes to the database. If it suddenly >> started encoding, that’s a breaking change that either was documented or >> should be reported upstream. >> >>> On Dec 16, 2024, at 17:13, Shaomei Liu <sliu.newjer...@gmail.com> wrote: >>> >>> Hello, >>> very happy to find this mailing list as it is my last resort!! >>> I have a project which uses DBI to write to postgres DB. >>> after upgrading from RHEL7 to RHEL8, the utf-8 character is not displayed >>> properly in the DB. DB has correct utf-8 encoding set. >>> for example, left double quotation mark “ is displayed as â\u0080\u009C. >>> You can use this link to check hex utf-8 bytes >>> https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%E2%80%9C&mode=char >>> >>> below is the file testutf.pl which writes left double quotation mark “ to >>> the database. it also shows the query results from psql for both EL8 and >>> EL7. >>> >>> ==========file testutf.pl========== >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use DBI; >>> print "DBI version: $DBI::VERSION\n"; >>> >>> my $db = "debugutf"; >>> my $host = "db"; >>> my $user = "postgres"; >>> my $pass = ""; >>> my $dbh = DBI->connect("DBI:Pg:dbname=$db;host=$host",$user,$pass); >>> my $sql = 'INSERT INTO table1 (title) VALUES (?)'; >>> my $query = $dbh->prepare($sql); >>> my @values = ('“'); >>> $query->execute(@values); >>> =================================== >>> >>> ==============on RHEL8 >>> #execute testutf.pl which wrote “ to database on RHEL8 >>> text.tac1.dev.bia-boeing.com> ./testutf.pl >>> DBI version: 1.641 >>> >>> #from psql >>> debugutf=# select * from table1; >>> title >>> --------------- >>> â\u0080\u009C =========>unexpected >>> (1 row) >>> >>> >>> ==============on RHEL7 >>> #execute testutf.pl which wrote “ to database on RHEL8 >>> text.tac1.dev.bia-boeing.com> ./testutf.pl >>> DBI version: 1.627 >>> >>> #from psql >>> debugutf=# select * from table1; >>> title >>> --------------- >>> “ ============>expected >>> (1 row) >>> >>> Any feedback is appreciated. >>> thank you >>> Shirley > -- > Alexander Foken > mailto:alexan...@foken.de