Bug#446741: man-db: cat page locale handling is poor

2010-02-13 Thread Colin Watson
tags 446741 fixed-upstream
thanks

On Mon, Oct 15, 2007 at 11:26:02AM +0100, Colin Watson wrote:
 man-db has excessively rigid logic on when to save cat pages with
 respect to locale configuration. It will only save cat pages if the
 current locale's character set matches the standard output encoding
 for the current manual hierarchy. This means that UTF-8 locales tend not
 to work well (except for pages in *.UTF-8 directories), and nor do
 locale configurations using different character sets in different locale
 categories.
 
 I think it would be reasonable to save the cat page in the appropriate
 encoding for the directory, but to recode to the current locale's
 character set at display time. There would then be no need for this
 restriction in most cases.
 
 There is still a wart with regard to English-language manual pages,
 which can't easily be in directories tagged with an explicit character
 set. So, perhaps a better answer would be to save all cat pages as UTF-8
 by default and rely on manconv to provide backward compatibility for old
 cat pages in legacy encodings. This might be more reliable and
 comprehensible in the long term.

I've fixed this upstream for man-db 2.5.7.

Sun Feb 14 00:19:47 GMT 2010  Colin Watson  cjwat...@debian.org

Always save cat pages in UTF-8 (Debian bug #446741).

* src/encodings.c (struct directory_entry): Remove
  standard_output_encoding member.
  (directory_table): Likewise.
  (get_standard_output_encoding): Remove.
* src/encodings.h (get_standard_output_encoding): Remove prototype.
* src/man.c (my_locale_charset): New function, with code moved from
  make_roff_command.
  (make_roff_command): Return pipeline output encoding in a new
  output parameter.  Remove enforcement that cat pages could only be
  saved for the manual hierarchy's default character set.  Move
  post-cat-page pipeline elements to ...
  (add_output_iconv, make_display_command): ... here.
  (make_display_command): Remove code for handling a named input
  file, which has been unused for some time.  New encoding argument.
  (open_cat_stream): New encoding argument.  Convert from it to
  UTF-8 while saving the cat page.
  (format_display_and_save): New encoding argument, passed to
  open_cat_stream.
  (display_catman): New encoding argument.  Convert from it to UTF-8
  while saving the cat page.
  (display): Get formatted encoding from make_roff_command and pass
  it to display_catman, make_display_command, and
  format_display_and_save.  Assume UTF-8 when displaying an existing
  cat page.
* NEWS: Document this.

Regards,

-- 
Colin Watson   [cjwat...@debian.org]



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100214003146.ga29...@riva.ucam.org



Bug#446741: man-db: cat page locale handling is poor

2007-10-15 Thread Colin Watson
Package: man-db
Version: 2.5.0-3
Severity: normal

man-db has excessively rigid logic on when to save cat pages with
respect to locale configuration. It will only save cat pages if the
current locale's character set matches the standard output encoding
for the current manual hierarchy. This means that UTF-8 locales tend not
to work well (except for pages in *.UTF-8 directories), and nor do
locale configurations using different character sets in different locale
categories.

I think it would be reasonable to save the cat page in the appropriate
encoding for the directory, but to recode to the current locale's
character set at display time. There would then be no need for this
restriction in most cases.

There is still a wart with regard to English-language manual pages,
which can't easily be in directories tagged with an explicit character
set. So, perhaps a better answer would be to save all cat pages as UTF-8
by default and rely on manconv to provide backward compatibility for old
cat pages in legacy encodings. This might be more reliable and
comprehensible in the long term.

-- 
Colin Watson   [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]