thank you, Grant!
thanks for sharing the video!!
liked your rule of thumb!
Shirley

On Wed, Dec 18, 2024 at 9:01 PM Grant McLean <[email protected]> wrote:

> Hello Shirley
>
> This is a complex topic which includes "encodings" and the way in which
> Perl is able to deal with binary data
> (e.g.: a string of bytes) vs character data (in which each character might
> be represented by one or more bytes).
>
> There is no "quick fix". You really need to understand when you're dealing
> with bytes vs characters. As a
> rule of thumb you'll want to "decode" data that is coming into your
> program and "encode" data that is
> being output to the world (e.g.: to a file or in a web page response).
>
> If you're prepared to make the effort to understand, here's a link to a
> video I made on the subject:
>
>     https://www.youtube.com/watch?v=cgswnneFp-s
>
> There are a number of reasons why the behaviour of your code might have
> changed following the upgrade.
> - The newer versions of libraries and utilities might have different
> defaults for handling bytes vs character data.
> - The "locale" setting in the upgraded system might be different (e.g.:
> LANG="C" vs LANG="en_US.UTF-8").
> - The environment in which the code executes might be different.
>
> Well-written code that is explicit about handling character data and where
> the encoding/decoding should happen
> would be resistant to those types of outside influences.
>
> In the video I walk through a scenario where some code which appeared to
> be working correctly but then one small
> change broke things in different ways.  The fixes are to add in explicit
> handling of encoding.
>
> However this is not really an issue that is specific to DBI or DBD::Pg -
> apart from being explicit about your use of the
> "pg_enable_utf8" attribute on your database handle:
>
>     https://metacpan.org/pod/DBD::Pg#pg_enable_utf8-(integer)
>
> I hope that sets you on the right path.
>
> Regards
> Grant McLean
>
> On Wed, 2024-12-18 at 16:33 -0500, Shaomei Liu wrote:
>
> send again after subscribing.
>
> On Wed, Dec 18, 2024 at 11:20 AM Shaomei Liu <[email protected]>
> wrote:
>
> Hello,
> I have a project which uses DBI to write to postgres DB.
> after upgrading from RHEL7 to RHEL8, the utf-8 character is not displayed
> properly in the DB. DB has correct utf-8 encoding set.
> for example, left double quotation mark   “  is displayed as â\u0080\u009C
> .
> with support from DBI community, the issue was solved by calling decode
> from Encode module before writing to DB.
> wondering what is the change from DBD::pg cause this issue.
>
> perl version is 5.26.3 and 5.16.3 on EL8 and EL7 respectively.
> DBI version is 1.641 and 1.627 on EL8 and EL7 respectively.
>
> here is the program and execution results.
> Any feedback are greatly appreciated!
> thank you
> Shirley
>
> xxx.com> cat testutf_decode.pl
> #!/usr/bin/perl
> use strict;
> use warnings;
> use DBI;
> use Encode 'decode';
> print "DBI version: $DBI::VERSION\n";
>
> my $db = "debugutf";
> my $host = "db";
> my $user = "postgres";
> my $pass = "";
> my $dbh = DBI->connect("DBI:Pg:dbname=$db;host=$host",$user,$pass);
> my $sql = 'INSERT INTO table1 (title) VALUES (?)';
> my $query = $dbh->prepare($sql);
> my $bytes = '“';
> my $chars = decode('UTF-8', $bytes);
> print "$bytes contains ".length($bytes)." characters\n";
> print "after decode $bytes contains ".length($chars)." characters\n";
> #my @values = ($bytes); #=======>without decode, Database shows “ on EL7
> but â\u0080\u009C on EL8
> my @values = ($chars);  #======>with decode, Database shows “ on both EL8
> and EL7, decode fixed the issue
> $query->execute(@values);
>
> ############### running on EL8
> xxx.com> ./testutf_decode.pl
> DBI version: 1.641
> “ contains 3 characters
> after decode “ contains 1 characters
>
> [yyy.com]$ psql -Upostgres -hdb debugutf
> psql (16.6)
> debugutf=# select * from table1;
>      title
> ---------------
>  â\u0080\u009C  ==========>NOK without decode
>  “              =============>OK with decode, so decode fixed the issue
> (2 rows)
>
> ############### running on EL7
> xxx.com> ./testutf_decode.pl
> DBI version: 1.627
> “ contains 3 characters
> after decode “ contains 1 characters
>
> [yyy.com]$ psql -Upostgres -hdb debugutf
> psql (16.6)
> debugutf=# select * from table1;
>      title
> ---------------
>  “           =============>OK without decode
>  “           =============>OK with decode
> (2 rows)
>
>
>

Reply via email to