Bug#441835: debsums: UTF-8 encoded man pages

2007-09-12 Thread Nicolas François
clone 441835 -1
reassign 441835 debsums
tags 441835 patch
retitle -1 po4a: [man] option to recode output with *roff sequences
severity -1 whishlist
thanks

On Wed, Sep 12, 2007 at 09:06:59PM +1000, [EMAIL PROTECTED] wrote:
> On Wed, Sep 12, 2007 at 08:59:22PM +1000, Brendan O'Dea wrote:
> >I'm not very familiar with the po4a code, but have attached a simple
> >proof of concept script which could perhaps be of help in adding this
> >feature to po4a.
> 
> Bugger.  Really attaching this script this time...

Thanks for the patch.

I will use it to add an option to recode the output using the appropriate
*roff sequences.

Regarding the original bug, here is a patch which will generate the Swedish
manpages in ISO-8859-1.
(The po4a_alias line redefines the default options for the man module for
all the documents; specifying the output encoding will avoid breakage in
case a PO gets its encoding changed to UTF-8).

Best Regards,
-- 
Nekral
--- ../orig/debsums-2.0.32/man/po4a.cfg	2006-02-20 15:40:49.0 +0100
+++ debsums-2.0.32/man/po4a.cfg	2007-09-12 20:23:10.0 +0200
@@ -1,15 +1,19 @@
 [po4a_langs] fr pt_BR ru sv
 [po4a_paths] po/debsums.pot $lang:po/$lang.po
 
+[po4a_alias:man] man opt:"-o groff_code=verbatim -o untranslated=Id" \
+ opt_fr:"-L ISO-8859-1" \
+ opt_pt_BR:"-L ISO-8859-1" \
+ opt_ru:"-L KOI8-R" \
+ opt_sv:"-L ISO-8859-1"
+
 [type: man] debsums.1 $lang:$lang/debsums.$lang.1 \
 	add_fr:fr/addendum.fr \
-	add_pt_BR:fr/addendum.pt_BR \
-	add_ru:ru/addendum.ru opt_ru:"-L KOI8-R" \
-	opt:"-o groff_code=verbatim -o untranslated=Id"
+	add_pt_BR:pt_BR/addendum.pt_BR \
+	add_ru:ru/addendum.ru
 
 [type: man] debsums_gen.8 \
 	$lang:$lang/debsums_gen.$lang.8 \
 	add_fr:fr/addendum.fr \
 	add_pt_BR:pt_BR/addendum.pt_BR \
-	add_ru:ru/addendum.ru opt_ru:"-L KOI8-R" \
-	opt:"-o groff_code=verbatim -o untranslated=Id"
+	add_ru:ru/addendum.ru


Bug#441835: debsums: UTF-8 encoded man pages

2007-09-12 Thread Colin Watson
On Wed, Sep 12, 2007 at 08:59:22PM +1000, Brendan O'Dea wrote:
> On Tue, Sep 11, 2007 at 12:41:58PM +0100, Reuben Thomas wrote:
> >The following manpages in your package appear to be UTF-8 encoded:
> >
> >/usr/share/man/sv/man1/debsums.1.gz
> >/usr/share/man/sv/man8/debsums_gen.8.gz
> >
> >According to Colin Watson, "With some exceptions for Far Eastern
> >languages, manual pages should not at present be encoded using UTF-8.
> >As such, it's reasonable to file bugs against packages attempting to
> >use UTF-8 where it is not yet supported."
> >
> >Hence I'm filing this bug. The solution for now seems to be to encode
> >the pages as 7-bit ASCII, using the appropriate *roff sequences to
> >produce accented characters.

Actually ISO-8859-1 is just fine right now, at least for Swedish.

> debsums uses po4a to generate that manual page using sv.po which the
> translator has chosen to encode in utf8.
> 
> I seem to recall hearing that utf8 is the preferred encoding for po
> files where possible, and in any case using utf8 is certainly much more
> easier and more readable than *roff escapes like \('e ...
> 
> As such, it would seem preferable to get po4a to do the transliteration
> from utf8 to the appropriate escapes.

At this point, perhaps it would be better to wait for UTF-8 manual page
support, which should be coming soon. See recent discussion on
debian-policy and debian-mentors.

(Sorry for the rapid change of tune, but things have been going quite
quickly here recently.)

-- 
Colin Watson   [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#441835: debsums: UTF-8 encoded man pages

2007-09-12 Thread Reuben Thomas

On Wed, 12 Sep 2007, Brendan O'Dea wrote:


As such, it would seem preferable to get po4a to do the transliteration
from utf8 to the appropriate escapes.


One other option is to use ISO-8859-1 (or other appropriate encoding) if 
that is sufficient to encode the characters you need.


--
http://rrt.sc3d.org/ | Crews help mock terror casualties (BBC)



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#441835: debsums: UTF-8 encoded man pages

2007-09-12 Thread Brendan O'Dea
On Wed, Sep 12, 2007 at 08:59:22PM +1000, Brendan O'Dea wrote:
>I'm not very familiar with the po4a code, but have attached a simple
>proof of concept script which could perhaps be of help in adding this
>feature to po4a.

Bugger.  Really attaching this script this time...

--bod
#!/usr/bin/perl

# proof of concept converter to exchange utf8 strings for groff escapes

use strict;
use warnings;

package Groff::Font::utf8;

use Carp;
use IO::File;

sub new
{
my $class = shift;
my $self = {};

my ($datadir) = grep -d, '/usr/share/groff',
 '/usr/local/share/groff' # etc
or croak "can't find groff data directory: $!";

my $fontdir = "$datadir/current/font";
unless (-d $fontdir)
{
# MacOS has no "current" symlink... punt
($fontdir) = grep -d, reverse glob "$datadir/[0-9]*/font";
}

my $font = IO::File->new("$fontdir/devutf8/R")
or croak "can't open groff utf8 roman font: $!";

while (<$font>)
{
/^(\S+) \s+ \d+ \s+ \d+ \s+ 0x([\da-f]+)/xi or next;
my ($esc, $code) = ($1, hex $2);
# skip unrepresentable chars and ascii range
next if $esc eq '---' or $code < 0x80;
my $len = length $esc;
$self->{$code} = $len == 1 ? "\\$esc"  :
 $len == 2 ? "\\($esc" :
 "\\[$esc]";
}

bless $self, $class;
}

# convert utf8 string
sub convert
{
my $self = shift;
my @s = @_;
s/([^\x00-\x7f])/$self->{ord $1} || $1/ge for @s;
wantarray ? @s : join '', @s;
}

# convert raw bytes
sub convert_raw
{
my $self = shift;
require Encode;
$self->convert(map Encode::decode_utf8($_), @_);
}

package main;

my $f = Groff::Font::utf8->new;

my $utf8 = convert_raw($bytes);

__END__
Det här är fri programvara, licenserad under villkoren för GNU General
Public License.  Det finns INGEN garanti; inte ens för SÄLJBARHET eller
LÄMPLIGHET FÖR NÅGOT SPECIELLT ÄNDAMÅL.


Bug#441835: debsums: UTF-8 encoded man pages

2007-09-12 Thread Brendan O'Dea
reassign 441835 po4a
thanks

On Tue, Sep 11, 2007 at 12:41:58PM +0100, Reuben Thomas wrote:
>The following manpages in your package appear to be UTF-8 encoded:
>
>/usr/share/man/sv/man1/debsums.1.gz
>/usr/share/man/sv/man8/debsums_gen.8.gz
>
>According to Colin Watson, "With some exceptions for Far Eastern
>languages, manual pages should not at present be encoded using UTF-8.
>As such, it's reasonable to file bugs against packages attempting to
>use UTF-8 where it is not yet supported."
>
>Hence I'm filing this bug. The solution for now seems to be to encode
>the pages as 7-bit ASCII, using the appropriate *roff sequences to
>produce accented characters.

debsums uses po4a to generate that manual page using sv.po which the
translator has chosen to encode in utf8.

I seem to recall hearing that utf8 is the preferred encoding for po
files where possible, and in any case using utf8 is certainly much more
easier and more readable than *roff escapes like \('e ...

As such, it would seem preferable to get po4a to do the transliteration
from utf8 to the appropriate escapes.

I'm not very familiar with the po4a code, but have attached a simple
proof of concept script which could perhaps be of help in adding this
feature to po4a.

--bod



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#441835: debsums: UTF-8 encoded man pages

2007-09-11 Thread Reuben Thomas
Package: debsums
Version: 2.0.32
Severity: minor

The following manpages in your package appear to be UTF-8 encoded:

/usr/share/man/sv/man1/debsums.1.gz
/usr/share/man/sv/man8/debsums_gen.8.gz

According to Colin Watson, "With some exceptions for Far Eastern
languages, manual pages should not at present be encoded using UTF-8.
As such, it's reasonable to file bugs against packages attempting to
use UTF-8 where it is not yet supported."

Hence I'm filing this bug. The solution for now seems to be to encode
the pages as 7-bit ASCII, using the appropriate *roff sequences to
produce accented characters.


-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (700, 'testing'), (600, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.21-2-686 (SMP w/1 CPU core)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages debsums depends on:
ii  debconf [debconf-2.0] 1.5.14 Debian configuration management sy
ii  perl  5.8.8-7Larry Wall's Practical Extraction 

debsums recommends no packages.

-- debconf information excluded



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]