Hi!

On Thu, 2024-06-13 at 00:02:43 +0200, Martin Quinson wrote:
> Le mercredi 12 juin 2024 à 17:14 +0200, Guillem Jover a écrit :
> > I think the charset comparison is too naive though, and does not cover
> > for example any aliases listed in «man Encode::Supported». I think a
> > better comparison logic could look like this:
> > 
> >   ,---
> >   use Encode;
> > 
> >   my $enc_charset = Encode::find_encoding($charset);
> >   my $enc_master_charset = Encode::find_encoding($master_charset);
> > 
> >   say 'match' if $enc_charset->mime_name eq $enc_master_charset->mime_name;
> >   `---
> > 
> > Unfortunately neither Encode::find_encoding()->name nor
> > Encode::resolve_alias() seem helpful here because they return
> > "utf-8-strict" for "UTF-8" which will not match against "utf-8" for
> > the canonical "utf8".
> 
> The problem is that Perl has a rather unexpected behavior wrt utf8, UTF-8 and
> UTF8. These names are not aliases of others in Perl. See
> https://perldoc.perl.org/Encode#UTF-8-vs.-utf8-vs.-UTF8

Sure, but here I think this does not matter (in theory), because it
depends on how the pod parser interprets the encoding name, and from
checking the perl code it seems it maps /utf-?8/i to ":encoding(UTF-8)".
So they are really treated the same, at least when it comes to POD, that
does not mean perl has that distinction for the encoding in other
contexts. The following two places I found that are in charge of parsing
POD lowercase and remove or ignore - (and _) when parsing the encoding
name:

  perl/Pod-Perldoc/lib/Pod/Perldoc.pm:set_encoding
  
https://sources.debian.org/src/perl/5.38.2-5/cpan/Pod-Perldoc/lib/Pod/Perldoc.pm/#L1054

  perl/Pod-Simple/lib/Pod/Simple/BlackBox.pm:_handle_encoding_line
  
https://sources.debian.org/src/perl/5.38.2-5/cpan/Pod-Simple/lib/Pod/Simple/BlackBox.pm/#L604

> So I don't think that you will find a piece of Perl code that will allow to
> merge these names to declare that they are equivalent. In some sense, that's
> the essence of all issues I introduced in po4a v0.70 when introducing perlIO 
> to
> control the encoding issues. I went from a lax system being utf8 by default 
> but
> happily taking latin-1 encoding files to a much stricter UTF-8 system by
> default and forcing the user to specify if they want another encoding.

I think the problem is that this might be conflating independent contexts,
such as IO, perl source code, POD, etc.

> Instead, I just pushed a commit changing the error message when charsets are
> utf8 and UTF-8 to insist on the difference between these encodings in Perl.
> https://github.com/mquinson/po4a/commit/afe6e1344ffad9d87dd807a81ed6467d6101b15f

I think this might break more than dpkg though. And from the above Pod
handling it might be unnecessary?

Thanks,
Guillem

Reply via email to