Hello Shirley This is a complex topic which includes "encodings" and the way in which Perl is able to deal with binary data (e.g.: a string of bytes) vs character data (in which each character might be represented by one or more bytes).
There is no "quick fix". You really need to understand when you're dealing with bytes vs characters. As a rule of thumb you'll want to "decode" data that is coming into your program and "encode" data that is being output to the world (e.g.: to a file or in a web page response). If you're prepared to make the effort to understand, here's a link to a video I made on the subject: https://www.youtube.com/watch?v=cgswnneFp-s There are a number of reasons why the behaviour of your code might have changed following the upgrade. - The newer versions of libraries and utilities might have different defaults for handling bytes vs character data. - The "locale" setting in the upgraded system might be different (e.g.: LANG="C" vs LANG="en_US.UTF-8"). - The environment in which the code executes might be different. Well-written code that is explicit about handling character data and where the encoding/decoding should happen would be resistant to those types of outside influences. In the video I walk through a scenario where some code which appeared to be working correctly but then one small change broke things in different ways. The fixes are to add in explicit handling of encoding. However this is not really an issue that is specific to DBI or DBD::Pg - apart from being explicit about your use of the "pg_enable_utf8" attribute on your database handle: https://metacpan.org/pod/DBD::Pg#pg_enable_utf8-(integer) I hope that sets you on the right path. Regards Grant McLean On Wed, 2024-12-18 at 16:33 -0500, Shaomei Liu wrote: > send again after subscribing. > > On Wed, Dec 18, 2024 at 11:20 AM Shaomei Liu > <[email protected]> wrote: > > Hello, > > I have a project which uses DBI to write to postgres DB. > > after upgrading from RHEL7 to RHEL8, the utf-8 character is not > > displayed properly in the DB. DB has correct utf-8 encoding set. > > for example, left double quotation mark “ is displayed > > as â\u0080\u009C. > > with support from DBI community, the issue was solved by calling > > decode from Encode module before writing to DB. > > wondering what is the change from DBD::pg cause this issue. > > > > perl version is 5.26.3 and 5.16.3 on EL8 and EL7 respectively. > > DBI version is 1.641 and 1.627 on EL8 and EL7 respectively. > > > > here is the program and execution results. > > Any feedback are greatly appreciated! > > thank you > > Shirley > > > > xxx.com> cat testutf_decode.pl > > #!/usr/bin/perl > > use strict; > > use warnings; > > use DBI; > > use Encode 'decode'; > > print "DBI version: $DBI::VERSION\n"; > > > > my $db = "debugutf"; > > my $host = "db"; > > my $user = "postgres"; > > my $pass = ""; > > my $dbh = DBI->connect("DBI:Pg:dbname=$db;host=$host",$user,$pass); > > my $sql = 'INSERT INTO table1 (title) VALUES (?)'; > > my $query = $dbh->prepare($sql); > > my $bytes = '“'; > > my $chars = decode('UTF-8', $bytes); > > print "$bytes contains ".length($bytes)." characters\n"; > > print "after decode $bytes contains ".length($chars)." > > characters\n"; > > #my @values = ($bytes); #=======>without decode, Database shows “ > > on EL7 but â\u0080\u009C on EL8 > > my @values = ($chars); #======>with decode, Database shows “ on > > both EL8 and EL7, decode fixed the issue > > $query->execute(@values); > > > > ############### running on EL8 > > xxx.com> ./testutf_decode.pl > > DBI version: 1.641 > > “ contains 3 characters > > after decode “ contains 1 characters > > > > [yyy.com]$ psql -Upostgres -hdb debugutf > > psql (16.6) > > debugutf=# select * from table1; > > title > > --------------- > > â\u0080\u009C ==========>NOK without decode > > “ =============>OK with decode, so decode fixed the > > issue > > (2 rows) > > > > ############### running on EL7 > > xxx.com> ./testutf_decode.pl > > DBI version: 1.627 > > “ contains 3 characters > > after decode “ contains 1 characters > > > > [yyy.com]$ psql -Upostgres -hdb debugutf > > psql (16.6) > > debugutf=# select * from table1; > > title > > --------------- > > “ =============>OK without decode > > “ =============>OK with decode > > (2 rows)
