Hi! On Thu, 2024-06-13 at 00:02:43 +0200, Martin Quinson wrote: > Le mercredi 12 juin 2024 à 17:14 +0200, Guillem Jover a écrit : > > I think the charset comparison is too naive though, and does not cover > > for example any aliases listed in «man Encode::Supported». I think a > > better comparison logic could look like this: > > > > ,--- > > use Encode; > > > > my $enc_charset = Encode::find_encoding($charset); > > my $enc_master_charset = Encode::find_encoding($master_charset); > > > > say 'match' if $enc_charset->mime_name eq $enc_master_charset->mime_name; > > `--- > > > > Unfortunately neither Encode::find_encoding()->name nor > > Encode::resolve_alias() seem helpful here because they return > > "utf-8-strict" for "UTF-8" which will not match against "utf-8" for > > the canonical "utf8". > > The problem is that Perl has a rather unexpected behavior wrt utf8, UTF-8 and > UTF8. These names are not aliases of others in Perl. See > https://perldoc.perl.org/Encode#UTF-8-vs.-utf8-vs.-UTF8
Sure, but here I think this does not matter (in theory), because it depends on how the pod parser interprets the encoding name, and from checking the perl code it seems it maps /utf-?8/i to ":encoding(UTF-8)". So they are really treated the same, at least when it comes to POD, that does not mean perl has that distinction for the encoding in other contexts. The following two places I found that are in charge of parsing POD lowercase and remove or ignore - (and _) when parsing the encoding name: perl/Pod-Perldoc/lib/Pod/Perldoc.pm:set_encoding https://sources.debian.org/src/perl/5.38.2-5/cpan/Pod-Perldoc/lib/Pod/Perldoc.pm/#L1054 perl/Pod-Simple/lib/Pod/Simple/BlackBox.pm:_handle_encoding_line https://sources.debian.org/src/perl/5.38.2-5/cpan/Pod-Simple/lib/Pod/Simple/BlackBox.pm/#L604 > So I don't think that you will find a piece of Perl code that will allow to > merge these names to declare that they are equivalent. In some sense, that's > the essence of all issues I introduced in po4a v0.70 when introducing perlIO > to > control the encoding issues. I went from a lax system being utf8 by default > but > happily taking latin-1 encoding files to a much stricter UTF-8 system by > default and forcing the user to specify if they want another encoding. I think the problem is that this might be conflating independent contexts, such as IO, perl source code, POD, etc. > Instead, I just pushed a commit changing the error message when charsets are > utf8 and UTF-8 to insist on the difference between these encodings in Perl. > https://github.com/mquinson/po4a/commit/afe6e1344ffad9d87dd807a81ed6467d6101b15f I think this might break more than dpkg though. And from the above Pod handling it might be unnecessary? Thanks, Guillem