Re: Some strings corrupted when inserting into liststore model

2021-10-19 Thread Grant McLean via gtk-perl-list
On Tue, 2021-10-19 at 11:34 +1100, Daniel Kasak via gtk-perl-list
wrote:
> Right. I found a hack on https://perldoc.perl.org/perlunicode ( which
> you directed me to ) that appears to have fixed *this* particular
> issue ( though it's not clear what I've then broken as a result )
> Calling:
> 
> Encode::_utf8_on($_)
> 
>  ... for every value just prior to being pushed into the model
> appears to work. Yay :)

The operative word here is "appears".  This hack will work for most
characters but not all.

The general advice for working with encodings from Perl is that you
should:

 * decode bytes on input to give you strings in Perl's internal
   representation which supports multi-byte characters; and

 * encode strings to bytes in a particular encoding on output

These days the most common encoding you will encounter is UTF-8.
To do the relevant decoding of a UTF-8 file you might open it like
this:

open(my $fh, '<:utf8', $filename);

Or, if the string was not read from a file but was simply defined in
your script, you would tell Perl to decode the bytes of your script
from UTF-8 by including this pragma:

use utf8;

For output to a file you might use:

open(my $fh, '>:utf8', $filename);

Your experience seems to suggest that the Perl Gtk bindings will do the
right thing when presented with a string that has the internal "utf8"
flag set.  But if your string has non-ASCII characters but does not
already have that flag set then it seems the decoding step has been
missed.

Data that came from a DB connection rather than a file might need to be
decoded with something like:

$perl_string = Encode::decode_utf8($db_string);

However most of the DBD drivers allow you to set a flag so that this
happens automatically.

The reason messing with the utf8 flag on the Perl string appears to
work is that Perl's internal encoding is almost-but-not-quite UTF-8.
For historical reasons (and arguably as an memory optimisation)
sometimes Perl will encode some characters in the range 0x80-0xFF as a
single byte ("Latin-1" encoding) rather than the two bytes that UTF-8
would require.

For example chr(0x20AC) would return a Perl string which was
represented in memory using UTF-8 bytes. Whereas chr(0xE9) would return
a Perl string which was represented in memory using a single Latin-1
byte.  Simply setting the utf8 flag on the first string would do no
harm (since it's already set) but it would make a mess of the second
string because it's only one byte long and not a valid UTF-8 sequence.

If you really want to understand this stuff here's a link to a
conference talk I did on the subject:

https://www.youtube.com/watch?v=cgswnneFp-s

Regards
Grant

___
gtk-perl-list mailing list
gtk-perl-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtk-perl-list


Re: Some strings corrupted when inserting into liststore model

2021-10-18 Thread Daniel Kasak via gtk-perl-list
Right. I found a hack on https://perldoc.perl.org/perlunicode ( which
you directed me to ) that appears to have fixed *this* particular
issue ( though it's not clear what I've then broken as a result )
Calling:

Encode::_utf8_on($_)

 ... for every value just prior to being pushed into the model appears
to work. Yay :)

Thanks!

Dan

On Mon, Oct 18, 2021 at 11:12 PM Jeremy Volkening via gtk-perl-list
 wrote:
>
> On Mon, Oct 18, 2021 at 08:39:36PM +1100, Daniel Kasak via gtk-perl-list 
> wrote:
> > It's not really clear if there's something *else* I'm
> > supposed to do to these strings coming out of the DB or not?
>
> Typically you need to tell Perl to treat them as UTF-8. Without knowing 
> exactly how you're getting your strings into Perl, there are a number of ways 
> to do this (https://perldoc.perl.org/perlunicode). For instance, if you're 
> reading from a filehandle you can set its mode:
>
> binmode(\*STDIN, ':utf8');
>
> Or you can can specifically mark the string after it's imported:
>
> use Encode;
> $str = Encode::decode("UTF-8", $str);
>
> One of these might help with your issue.
>
> Jeremy
> ___
> gtk-perl-list mailing list
> gtk-perl-list@gnome.org
> https://mail.gnome.org/mailman/listinfo/gtk-perl-list
___
gtk-perl-list mailing list
gtk-perl-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtk-perl-list


Re: Some strings corrupted when inserting into liststore model

2021-10-18 Thread Jeremy Volkening via gtk-perl-list
On Mon, Oct 18, 2021 at 08:39:36PM +1100, Daniel Kasak via gtk-perl-list wrote:
> It's not really clear if there's something *else* I'm
> supposed to do to these strings coming out of the DB or not?

Typically you need to tell Perl to treat them as UTF-8. Without knowing exactly 
how you're getting your strings into Perl, there are a number of ways to do 
this (https://perldoc.perl.org/perlunicode). For instance, if you're reading 
from a filehandle you can set its mode:

binmode(\*STDIN, ':utf8');

Or you can can specifically mark the string after it's imported:

use Encode;
$str = Encode::decode("UTF-8", $str);

One of these might help with your issue.

Jeremy
___
gtk-perl-list mailing list
gtk-perl-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtk-perl-list


Re: Some strings corrupted when inserting into liststore model

2021-10-18 Thread Daniel Kasak via gtk-perl-list
Hi Jeremy. Thanks for the response :)

> In the case of your example script, you need 'use utf8;' in the preamble. In 
> the case of your example script, you
> need 'use utf8;' in the preamble. This fixes handling of the hard-coded 
> unicode characters in the script -- it won't
> necessarily fix the issue with the strings coming from your database.

Interesting. Yeah I don't usually embed unicode in source - I'd
forgotten about that :)

>  Are you certain they are being stored in the database correctly?

Yeah actually they're coming from an execution plan, so they're not
"stored" as such in the database. Example:

+-+--+---+--++
| id  | estRows  | task  | access object|
operator info  |
+-+--+---+--++
| TableReader_7   | 6656.67  | root  |  |
data:Selection_6   |
| └─Selection_6   | 6656.67  | cop[tikv] |  |
ne(dett.ffa_client.city, "")   |
|   └─TableFullScan_5 | 1.00 | cop[tikv] | table:ffa_client |
keep order:false, stats:pseudo |
+-+--+---+--++
3 rows in set (0.010 sec)

It renders correctly in a console in the MySQL client. I can inspect
the values in a debugger and copy them out, and they render correctly
both inside the IDE, and in a text editor when I paste the values. I
was under the impression that Perl handled strings as kinda-utf8
internally. It's not really clear if there's something *else* I'm
supposed to do to these strings coming out of the DB or not? Anyway,
they totally look correct if I print() them or inspect them in an IDE.

Dan
___
gtk-perl-list mailing list
gtk-perl-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtk-perl-list


Re: Some strings corrupted when inserting into liststore model

2021-10-17 Thread Jeremy Volkening via gtk-perl-list
In the case of your example script, you need 'use utf8;' in the preamble. This 
fixes handling of the hard-coded unicode characters in the script -- it won't 
necessarily fix the issue with the strings coming from your database. Are you 
certain they are being stored in the database correctly?

Jeremy

On Mon, Oct 18, 2021 at 02:42:42PM +1100, Daniel Kasak via gtk-perl-list wrote:
> Hi all. I'm seeing some strings ( coming from a database ) corrupted
> when I insert into a liststore model. I've pasted a bare-bones script
> before which demonstrates the issue ( hard-coded string value in this
> case ). Any ideas what's happening and how to get the original string
> rendering? Interestingly, I can copy/paste directly into the
> treeview/cell and it handles data input this way.
> 
> ---
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use Gtk3 '-init';
> use Glib 'TRUE', 'FALSE';
> 
> use Encode;
> 
> my $window = Gtk3::Window->new;
> $window->signal_connect( destroy => sub { Gtk3->main_quit } );
> $window->set_border_width(8);
> $window->set_default_size( 300, 250 );
> 
> my $box = Gtk3::Box->new( 'vertical', 8 );
> $box->set_homogeneous(FALSE);
> $window->add($box);
> 
> my $sw = Gtk3::ScrolledWindow->new( undef, undef );
> $sw->set_shadow_type('etched-in');
> $sw->set_policy( 'never', 'automatic' );
> $box->pack_start( $sw, TRUE, TRUE, 5 );
> 
> # Create TreeModel
> my $model = Gtk3::ListStore->new( 'Glib::String', );
> my $iter = $model->append();
> $model->set( $iter , 0 , "└─Selection_6" );
> 
> # Create a TreeView
> my $treeview = Gtk3::TreeView->new($model);
> $treeview->set_rules_hint(TRUE);
> $treeview->set_search_column(0);
> $sw->add($treeview);
> 
> # Add columns to TreeView
> add_columns($treeview);
> 
> $window->show_all;
> Gtk3->main();
> 
> sub add_columns {
> my $treeview = shift;
> my $model= $treeview->get_model();
> my $renderer = Gtk3::CellRendererText->new;
> # Column for description
> my $column = Gtk3::TreeViewColumn->new_with_attributes(
> 'Description', $renderer, text => 0 );
> $column->set_sort_column_id(0);
> $treeview->append_column($column);
> }
> ___
> gtk-perl-list mailing list
> gtk-perl-list@gnome.org
> https://mail.gnome.org/mailman/listinfo/gtk-perl-list
___
gtk-perl-list mailing list
gtk-perl-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtk-perl-list


Some strings corrupted when inserting into liststore model

2021-10-17 Thread Daniel Kasak via gtk-perl-list
Hi all. I'm seeing some strings ( coming from a database ) corrupted
when I insert into a liststore model. I've pasted a bare-bones script
before which demonstrates the issue ( hard-coded string value in this
case ). Any ideas what's happening and how to get the original string
rendering? Interestingly, I can copy/paste directly into the
treeview/cell and it handles data input this way.

---

#!/usr/bin/perl

use strict;
use warnings;

use Gtk3 '-init';
use Glib 'TRUE', 'FALSE';

use Encode;

my $window = Gtk3::Window->new;
$window->signal_connect( destroy => sub { Gtk3->main_quit } );
$window->set_border_width(8);
$window->set_default_size( 300, 250 );

my $box = Gtk3::Box->new( 'vertical', 8 );
$box->set_homogeneous(FALSE);
$window->add($box);

my $sw = Gtk3::ScrolledWindow->new( undef, undef );
$sw->set_shadow_type('etched-in');
$sw->set_policy( 'never', 'automatic' );
$box->pack_start( $sw, TRUE, TRUE, 5 );

# Create TreeModel
my $model = Gtk3::ListStore->new( 'Glib::String', );
my $iter = $model->append();
$model->set( $iter , 0 , "└─Selection_6" );

# Create a TreeView
my $treeview = Gtk3::TreeView->new($model);
$treeview->set_rules_hint(TRUE);
$treeview->set_search_column(0);
$sw->add($treeview);

# Add columns to TreeView
add_columns($treeview);

$window->show_all;
Gtk3->main();

sub add_columns {
my $treeview = shift;
my $model= $treeview->get_model();
my $renderer = Gtk3::CellRendererText->new;
# Column for description
my $column = Gtk3::TreeViewColumn->new_with_attributes(
'Description', $renderer, text => 0 );
$column->set_sort_column_id(0);
$treeview->append_column($column);
}
___
gtk-perl-list mailing list
gtk-perl-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtk-perl-list