Re: Some strings corrupted when inserting into liststore model
On Tue, 2021-10-19 at 11:34 +1100, Daniel Kasak via gtk-perl-list wrote: > Right. I found a hack on https://perldoc.perl.org/perlunicode ( which > you directed me to ) that appears to have fixed *this* particular > issue ( though it's not clear what I've then broken as a result ) > Calling: > > Encode::_utf8_on($_) > > ... for every value just prior to being pushed into the model > appears to work. Yay :) The operative word here is "appears". This hack will work for most characters but not all. The general advice for working with encodings from Perl is that you should: * decode bytes on input to give you strings in Perl's internal representation which supports multi-byte characters; and * encode strings to bytes in a particular encoding on output These days the most common encoding you will encounter is UTF-8. To do the relevant decoding of a UTF-8 file you might open it like this: open(my $fh, '<:utf8', $filename); Or, if the string was not read from a file but was simply defined in your script, you would tell Perl to decode the bytes of your script from UTF-8 by including this pragma: use utf8; For output to a file you might use: open(my $fh, '>:utf8', $filename); Your experience seems to suggest that the Perl Gtk bindings will do the right thing when presented with a string that has the internal "utf8" flag set. But if your string has non-ASCII characters but does not already have that flag set then it seems the decoding step has been missed. Data that came from a DB connection rather than a file might need to be decoded with something like: $perl_string = Encode::decode_utf8($db_string); However most of the DBD drivers allow you to set a flag so that this happens automatically. The reason messing with the utf8 flag on the Perl string appears to work is that Perl's internal encoding is almost-but-not-quite UTF-8. For historical reasons (and arguably as an memory optimisation) sometimes Perl will encode some characters in the range 0x80-0xFF as a single byte ("Latin-1" encoding) rather than the two bytes that UTF-8 would require. For example chr(0x20AC) would return a Perl string which was represented in memory using UTF-8 bytes. Whereas chr(0xE9) would return a Perl string which was represented in memory using a single Latin-1 byte. Simply setting the utf8 flag on the first string would do no harm (since it's already set) but it would make a mess of the second string because it's only one byte long and not a valid UTF-8 sequence. If you really want to understand this stuff here's a link to a conference talk I did on the subject: https://www.youtube.com/watch?v=cgswnneFp-s Regards Grant ___ gtk-perl-list mailing list gtk-perl-list@gnome.org https://mail.gnome.org/mailman/listinfo/gtk-perl-list
Re: Some strings corrupted when inserting into liststore model
Right. I found a hack on https://perldoc.perl.org/perlunicode ( which you directed me to ) that appears to have fixed *this* particular issue ( though it's not clear what I've then broken as a result ) Calling: Encode::_utf8_on($_) ... for every value just prior to being pushed into the model appears to work. Yay :) Thanks! Dan On Mon, Oct 18, 2021 at 11:12 PM Jeremy Volkening via gtk-perl-list wrote: > > On Mon, Oct 18, 2021 at 08:39:36PM +1100, Daniel Kasak via gtk-perl-list > wrote: > > It's not really clear if there's something *else* I'm > > supposed to do to these strings coming out of the DB or not? > > Typically you need to tell Perl to treat them as UTF-8. Without knowing > exactly how you're getting your strings into Perl, there are a number of ways > to do this (https://perldoc.perl.org/perlunicode). For instance, if you're > reading from a filehandle you can set its mode: > > binmode(\*STDIN, ':utf8'); > > Or you can can specifically mark the string after it's imported: > > use Encode; > $str = Encode::decode("UTF-8", $str); > > One of these might help with your issue. > > Jeremy > ___ > gtk-perl-list mailing list > gtk-perl-list@gnome.org > https://mail.gnome.org/mailman/listinfo/gtk-perl-list ___ gtk-perl-list mailing list gtk-perl-list@gnome.org https://mail.gnome.org/mailman/listinfo/gtk-perl-list
Re: Some strings corrupted when inserting into liststore model
On Mon, Oct 18, 2021 at 08:39:36PM +1100, Daniel Kasak via gtk-perl-list wrote: > It's not really clear if there's something *else* I'm > supposed to do to these strings coming out of the DB or not? Typically you need to tell Perl to treat them as UTF-8. Without knowing exactly how you're getting your strings into Perl, there are a number of ways to do this (https://perldoc.perl.org/perlunicode). For instance, if you're reading from a filehandle you can set its mode: binmode(\*STDIN, ':utf8'); Or you can can specifically mark the string after it's imported: use Encode; $str = Encode::decode("UTF-8", $str); One of these might help with your issue. Jeremy ___ gtk-perl-list mailing list gtk-perl-list@gnome.org https://mail.gnome.org/mailman/listinfo/gtk-perl-list
Re: Some strings corrupted when inserting into liststore model
Hi Jeremy. Thanks for the response :) > In the case of your example script, you need 'use utf8;' in the preamble. In > the case of your example script, you > need 'use utf8;' in the preamble. This fixes handling of the hard-coded > unicode characters in the script -- it won't > necessarily fix the issue with the strings coming from your database. Interesting. Yeah I don't usually embed unicode in source - I'd forgotten about that :) > Are you certain they are being stored in the database correctly? Yeah actually they're coming from an execution plan, so they're not "stored" as such in the database. Example: +-+--+---+--++ | id | estRows | task | access object| operator info | +-+--+---+--++ | TableReader_7 | 6656.67 | root | | data:Selection_6 | | └─Selection_6 | 6656.67 | cop[tikv] | | ne(dett.ffa_client.city, "") | | └─TableFullScan_5 | 1.00 | cop[tikv] | table:ffa_client | keep order:false, stats:pseudo | +-+--+---+--++ 3 rows in set (0.010 sec) It renders correctly in a console in the MySQL client. I can inspect the values in a debugger and copy them out, and they render correctly both inside the IDE, and in a text editor when I paste the values. I was under the impression that Perl handled strings as kinda-utf8 internally. It's not really clear if there's something *else* I'm supposed to do to these strings coming out of the DB or not? Anyway, they totally look correct if I print() them or inspect them in an IDE. Dan ___ gtk-perl-list mailing list gtk-perl-list@gnome.org https://mail.gnome.org/mailman/listinfo/gtk-perl-list
Re: Some strings corrupted when inserting into liststore model
In the case of your example script, you need 'use utf8;' in the preamble. This fixes handling of the hard-coded unicode characters in the script -- it won't necessarily fix the issue with the strings coming from your database. Are you certain they are being stored in the database correctly? Jeremy On Mon, Oct 18, 2021 at 02:42:42PM +1100, Daniel Kasak via gtk-perl-list wrote: > Hi all. I'm seeing some strings ( coming from a database ) corrupted > when I insert into a liststore model. I've pasted a bare-bones script > before which demonstrates the issue ( hard-coded string value in this > case ). Any ideas what's happening and how to get the original string > rendering? Interestingly, I can copy/paste directly into the > treeview/cell and it handles data input this way. > > --- > > #!/usr/bin/perl > > use strict; > use warnings; > > use Gtk3 '-init'; > use Glib 'TRUE', 'FALSE'; > > use Encode; > > my $window = Gtk3::Window->new; > $window->signal_connect( destroy => sub { Gtk3->main_quit } ); > $window->set_border_width(8); > $window->set_default_size( 300, 250 ); > > my $box = Gtk3::Box->new( 'vertical', 8 ); > $box->set_homogeneous(FALSE); > $window->add($box); > > my $sw = Gtk3::ScrolledWindow->new( undef, undef ); > $sw->set_shadow_type('etched-in'); > $sw->set_policy( 'never', 'automatic' ); > $box->pack_start( $sw, TRUE, TRUE, 5 ); > > # Create TreeModel > my $model = Gtk3::ListStore->new( 'Glib::String', ); > my $iter = $model->append(); > $model->set( $iter , 0 , "└─Selection_6" ); > > # Create a TreeView > my $treeview = Gtk3::TreeView->new($model); > $treeview->set_rules_hint(TRUE); > $treeview->set_search_column(0); > $sw->add($treeview); > > # Add columns to TreeView > add_columns($treeview); > > $window->show_all; > Gtk3->main(); > > sub add_columns { > my $treeview = shift; > my $model= $treeview->get_model(); > my $renderer = Gtk3::CellRendererText->new; > # Column for description > my $column = Gtk3::TreeViewColumn->new_with_attributes( > 'Description', $renderer, text => 0 ); > $column->set_sort_column_id(0); > $treeview->append_column($column); > } > ___ > gtk-perl-list mailing list > gtk-perl-list@gnome.org > https://mail.gnome.org/mailman/listinfo/gtk-perl-list ___ gtk-perl-list mailing list gtk-perl-list@gnome.org https://mail.gnome.org/mailman/listinfo/gtk-perl-list
Some strings corrupted when inserting into liststore model
Hi all. I'm seeing some strings ( coming from a database ) corrupted when I insert into a liststore model. I've pasted a bare-bones script before which demonstrates the issue ( hard-coded string value in this case ). Any ideas what's happening and how to get the original string rendering? Interestingly, I can copy/paste directly into the treeview/cell and it handles data input this way. --- #!/usr/bin/perl use strict; use warnings; use Gtk3 '-init'; use Glib 'TRUE', 'FALSE'; use Encode; my $window = Gtk3::Window->new; $window->signal_connect( destroy => sub { Gtk3->main_quit } ); $window->set_border_width(8); $window->set_default_size( 300, 250 ); my $box = Gtk3::Box->new( 'vertical', 8 ); $box->set_homogeneous(FALSE); $window->add($box); my $sw = Gtk3::ScrolledWindow->new( undef, undef ); $sw->set_shadow_type('etched-in'); $sw->set_policy( 'never', 'automatic' ); $box->pack_start( $sw, TRUE, TRUE, 5 ); # Create TreeModel my $model = Gtk3::ListStore->new( 'Glib::String', ); my $iter = $model->append(); $model->set( $iter , 0 , "└─Selection_6" ); # Create a TreeView my $treeview = Gtk3::TreeView->new($model); $treeview->set_rules_hint(TRUE); $treeview->set_search_column(0); $sw->add($treeview); # Add columns to TreeView add_columns($treeview); $window->show_all; Gtk3->main(); sub add_columns { my $treeview = shift; my $model= $treeview->get_model(); my $renderer = Gtk3::CellRendererText->new; # Column for description my $column = Gtk3::TreeViewColumn->new_with_attributes( 'Description', $renderer, text => 0 ); $column->set_sort_column_id(0); $treeview->append_column($column); } ___ gtk-perl-list mailing list gtk-perl-list@gnome.org https://mail.gnome.org/mailman/listinfo/gtk-perl-list