Bug#441835: debsums: UTF-8 encoded man pages
clone 441835 -1 reassign 441835 debsums tags 441835 patch retitle -1 po4a: [man] option to recode output with *roff sequences severity -1 whishlist thanks On Wed, Sep 12, 2007 at 09:06:59PM +1000, [EMAIL PROTECTED] wrote: > On Wed, Sep 12, 2007 at 08:59:22PM +1000, Brendan O'Dea wrote: > >I'm not very familiar with the po4a code, but have attached a simple > >proof of concept script which could perhaps be of help in adding this > >feature to po4a. > > Bugger. Really attaching this script this time... Thanks for the patch. I will use it to add an option to recode the output using the appropriate *roff sequences. Regarding the original bug, here is a patch which will generate the Swedish manpages in ISO-8859-1. (The po4a_alias line redefines the default options for the man module for all the documents; specifying the output encoding will avoid breakage in case a PO gets its encoding changed to UTF-8). Best Regards, -- Nekral --- ../orig/debsums-2.0.32/man/po4a.cfg 2006-02-20 15:40:49.0 +0100 +++ debsums-2.0.32/man/po4a.cfg 2007-09-12 20:23:10.0 +0200 @@ -1,15 +1,19 @@ [po4a_langs] fr pt_BR ru sv [po4a_paths] po/debsums.pot $lang:po/$lang.po +[po4a_alias:man] man opt:"-o groff_code=verbatim -o untranslated=Id" \ + opt_fr:"-L ISO-8859-1" \ + opt_pt_BR:"-L ISO-8859-1" \ + opt_ru:"-L KOI8-R" \ + opt_sv:"-L ISO-8859-1" + [type: man] debsums.1 $lang:$lang/debsums.$lang.1 \ add_fr:fr/addendum.fr \ - add_pt_BR:fr/addendum.pt_BR \ - add_ru:ru/addendum.ru opt_ru:"-L KOI8-R" \ - opt:"-o groff_code=verbatim -o untranslated=Id" + add_pt_BR:pt_BR/addendum.pt_BR \ + add_ru:ru/addendum.ru [type: man] debsums_gen.8 \ $lang:$lang/debsums_gen.$lang.8 \ add_fr:fr/addendum.fr \ add_pt_BR:pt_BR/addendum.pt_BR \ - add_ru:ru/addendum.ru opt_ru:"-L KOI8-R" \ - opt:"-o groff_code=verbatim -o untranslated=Id" + add_ru:ru/addendum.ru
Bug#441835: debsums: UTF-8 encoded man pages
On Wed, Sep 12, 2007 at 08:59:22PM +1000, Brendan O'Dea wrote: > On Tue, Sep 11, 2007 at 12:41:58PM +0100, Reuben Thomas wrote: > >The following manpages in your package appear to be UTF-8 encoded: > > > >/usr/share/man/sv/man1/debsums.1.gz > >/usr/share/man/sv/man8/debsums_gen.8.gz > > > >According to Colin Watson, "With some exceptions for Far Eastern > >languages, manual pages should not at present be encoded using UTF-8. > >As such, it's reasonable to file bugs against packages attempting to > >use UTF-8 where it is not yet supported." > > > >Hence I'm filing this bug. The solution for now seems to be to encode > >the pages as 7-bit ASCII, using the appropriate *roff sequences to > >produce accented characters. Actually ISO-8859-1 is just fine right now, at least for Swedish. > debsums uses po4a to generate that manual page using sv.po which the > translator has chosen to encode in utf8. > > I seem to recall hearing that utf8 is the preferred encoding for po > files where possible, and in any case using utf8 is certainly much more > easier and more readable than *roff escapes like \('e ... > > As such, it would seem preferable to get po4a to do the transliteration > from utf8 to the appropriate escapes. At this point, perhaps it would be better to wait for UTF-8 manual page support, which should be coming soon. See recent discussion on debian-policy and debian-mentors. (Sorry for the rapid change of tune, but things have been going quite quickly here recently.) -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#441835: debsums: UTF-8 encoded man pages
On Wed, 12 Sep 2007, Brendan O'Dea wrote: As such, it would seem preferable to get po4a to do the transliteration from utf8 to the appropriate escapes. One other option is to use ISO-8859-1 (or other appropriate encoding) if that is sufficient to encode the characters you need. -- http://rrt.sc3d.org/ | Crews help mock terror casualties (BBC) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#441835: debsums: UTF-8 encoded man pages
On Wed, Sep 12, 2007 at 08:59:22PM +1000, Brendan O'Dea wrote: >I'm not very familiar with the po4a code, but have attached a simple >proof of concept script which could perhaps be of help in adding this >feature to po4a. Bugger. Really attaching this script this time... --bod #!/usr/bin/perl # proof of concept converter to exchange utf8 strings for groff escapes use strict; use warnings; package Groff::Font::utf8; use Carp; use IO::File; sub new { my $class = shift; my $self = {}; my ($datadir) = grep -d, '/usr/share/groff', '/usr/local/share/groff' # etc or croak "can't find groff data directory: $!"; my $fontdir = "$datadir/current/font"; unless (-d $fontdir) { # MacOS has no "current" symlink... punt ($fontdir) = grep -d, reverse glob "$datadir/[0-9]*/font"; } my $font = IO::File->new("$fontdir/devutf8/R") or croak "can't open groff utf8 roman font: $!"; while (<$font>) { /^(\S+) \s+ \d+ \s+ \d+ \s+ 0x([\da-f]+)/xi or next; my ($esc, $code) = ($1, hex $2); # skip unrepresentable chars and ascii range next if $esc eq '---' or $code < 0x80; my $len = length $esc; $self->{$code} = $len == 1 ? "\\$esc" : $len == 2 ? "\\($esc" : "\\[$esc]"; } bless $self, $class; } # convert utf8 string sub convert { my $self = shift; my @s = @_; s/([^\x00-\x7f])/$self->{ord $1} || $1/ge for @s; wantarray ? @s : join '', @s; } # convert raw bytes sub convert_raw { my $self = shift; require Encode; $self->convert(map Encode::decode_utf8($_), @_); } package main; my $f = Groff::Font::utf8->new; my $utf8 =convert_raw($bytes); __END__ Det här är fri programvara, licenserad under villkoren för GNU General Public License. Det finns INGEN garanti; inte ens för SÄLJBARHET eller LÄMPLIGHET FÖR NÅGOT SPECIELLT ÄNDAMÅL.
Bug#441835: debsums: UTF-8 encoded man pages
reassign 441835 po4a thanks On Tue, Sep 11, 2007 at 12:41:58PM +0100, Reuben Thomas wrote: >The following manpages in your package appear to be UTF-8 encoded: > >/usr/share/man/sv/man1/debsums.1.gz >/usr/share/man/sv/man8/debsums_gen.8.gz > >According to Colin Watson, "With some exceptions for Far Eastern >languages, manual pages should not at present be encoded using UTF-8. >As such, it's reasonable to file bugs against packages attempting to >use UTF-8 where it is not yet supported." > >Hence I'm filing this bug. The solution for now seems to be to encode >the pages as 7-bit ASCII, using the appropriate *roff sequences to >produce accented characters. debsums uses po4a to generate that manual page using sv.po which the translator has chosen to encode in utf8. I seem to recall hearing that utf8 is the preferred encoding for po files where possible, and in any case using utf8 is certainly much more easier and more readable than *roff escapes like \('e ... As such, it would seem preferable to get po4a to do the transliteration from utf8 to the appropriate escapes. I'm not very familiar with the po4a code, but have attached a simple proof of concept script which could perhaps be of help in adding this feature to po4a. --bod -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#441835: debsums: UTF-8 encoded man pages
Package: debsums Version: 2.0.32 Severity: minor The following manpages in your package appear to be UTF-8 encoded: /usr/share/man/sv/man1/debsums.1.gz /usr/share/man/sv/man8/debsums_gen.8.gz According to Colin Watson, "With some exceptions for Far Eastern languages, manual pages should not at present be encoded using UTF-8. As such, it's reasonable to file bugs against packages attempting to use UTF-8 where it is not yet supported." Hence I'm filing this bug. The solution for now seems to be to encode the pages as 7-bit ASCII, using the appropriate *roff sequences to produce accented characters. -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (700, 'testing'), (600, 'unstable') Architecture: i386 (i686) Kernel: Linux 2.6.21-2-686 (SMP w/1 CPU core) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages debsums depends on: ii debconf [debconf-2.0] 1.5.14 Debian configuration management sy ii perl 5.8.8-7Larry Wall's Practical Extraction debsums recommends no packages. -- debconf information excluded -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]