DJ Lucas wrote: > Guys, I'm obviously lacking creativity tonight. ;-) I've posted a > local copy of the book in my home dir on quantum. I would like > someone else (or many somebody elses) to review the textual changes > on the man-db page for both technical and grammatical errors. > > http://www.linuxfromscratch.org/~dj/LFS-MANDB/chapter06/man-db.html > > Thanks in advance.
> Some packages provide UTF-8 man pages, which previous versions of > Man-DB were unable to display. This limitation has been overcome in > recent versions, and Man-DB can now convert man pages from legacy > 8-bit encodings to UTF-8 (and vice-versa) on the fly. I don't like the wording here. We need to mention two features separately: 1) conversion TO arbitrary encoding on the fly (was present in old versions of Man-DB, too, but is just a distracting factor here); 2) expectations about the input (changed, was hard-coded, now, in addition, looks into the extension of the directory). Better, but IMHO still not acceptable for anything except -dev book: ================ Some packages provide UTF-8 man pages, which previous versions of Man-DB were unable to display correctly, because the expected (8-bit) encoding for each language was hard-coded in the source of Man-DB. Now Man-DB uses the extension of the directory name in order to determine the encoding of the manual pages stored there, and uses the built-in table only if the encoding is not speciried in the directory name. E.g., because of "UTF-8" in the directory name, it knows that all manual pages residing in /usr/share/man/fr.UTF-8 are UTF-8 encoded and, according to the built-in table, expects all manual pages residing in /usr/share/man/ru to be in KOI8-R. On the other hand, the setup in Fedora Core expected all manual pages to be UTF-8 encoded and stored in directories without suffixes ".UTF-8". ================ Bruce: could you please try to criticise or shorten this? > This used to be "This" => "Disagreement about the expected encoding of manual pages". > a rather annoying problem across different distributions, as packages > written for one distribution would require changes to work on > another. > This script was written, and included in LFS to overcome > this problem. The script will allow you to pass an in and out value > to convert man pages to and from legacy 8-bit and UTF-8 encodings. Technically, we don't need it. But it is still abused in BLFS to convert Midnight Commander hints after patching. We definitely don't need the script so close to the beginning of the page, I propose to move it to the "Non-English Manual Pages in LFS" section. > 6.47.2. Non-English Manual Pages in LFS > > Linux distributions have different policies concerning the character > encoding in which manual pages are stored in the filesystem. E.g., > RedHat stores all manual pages in UTF-8, while Debian previously used and still uses predominantly > language-specific (mostly 8-bit) encodings. As mentioned above, this > leads to incompatibility of packages with manual pages designed for > different distributions. > LFS previously used the same convention as Debian. This was chosen > because Man-DB did not understand man pages stored in UTF-8 at the > time of its introduction into LFS. For our purposes at that time, > Man-DB was preferable to Man as it worked without any additional > configuration in any locale. OK. > This is still true today as Man-DB with > Debian patched Groff will now properly convert UTF-8 encoded man > pages to the user's locale on the fly. Only if they are placed correctly. > Additionally, this combination > provides support for Chinese and Japanese locales, and limited > support for Korean, whereas Man does not. Wrong. Man does work (if we ignore translations of error messages) with the same languages if used together with Debian-patched groff. The only difference is that Man has the pipeline constructed in the configuration file by the user, while Man-DB constructs the pipeline programmatically by applying knowledge about the expected input and output encoding of various programs. Obviously, a user can write the same pipeline into Man configuration file, but this would take several pages to explain. > The current offering of Man > as used in RedHat requires major modifications to both the Man and > Groff packages, true > and still falls short on Chinese, Japanese, and > Korean encodings. not sure. > Finally, it should be noted that most distributions, including > Debian, are rapidly migrating to all UTF-8 encoded man pages. Wrong. Most distributions (including Gentoo and Arch) completely ignore the problem, present to the user the unreadable mix of 8-bit and UTF-8 pages in the same directory, and are thus broken. The leading and government-sponsored Russian distribution (Alt Linux) still uses 8-bit (KOI8-R) manual pages. The only distributions that converted fully are RedHat derivatives. Debian only starts to get ready. > Upstream packagers will very likely drop legacy encodings in favor of > UTF-8, though adoption has been slow due to the hacks required to > make the current Man and Groff packages work correctly together. I don't know how to comment on this. Modern desktop packages come with DocBook documentation, not manual pages. > The relationship between language codes and the expected encoding of > legacy manual pages is listed below. > > Table 6.1. <snip> Up to this point, nothing is said (except in the text I proposed at the very top of my post) HOW Man-DB determines the encoding of a manual page. Theory should be given before examples, not in examples. This worked before, because the whole theory was expressed in the table. > If upstream distributes the manual pages in a legacy encoding the > manual pages can simply be copied to /usr/share/man/<language code>. > For example, German manual pages can be installed with the following > commands: > > mkdir -p /usr/share/man/de cp -rv man? /usr/share/man/de OK > If upstream distributes manual pages in UTF-8 (i.e., “for RedHat”) > instead of the encoding listed in the table above, they can either be > converted from UTF-8 to the encoding listed in the table above, or > they can be installed directly into /usr/share/man/<language > code>.UTF-8. OK. Here the script would go. Also I'd like to see comparison of both approaches. E.g., if the manual pages are installed with a Makefile, it is often easier to convert manual pages before installation than to patch the Makefile. > For example, to install Spanish manual pages Let's drop this buggy package and explain both techniques with French manual pages. -- Alexander E. Patrakov -- http://linuxfromscratch.org/mailman/listinfo/lfs-dev FAQ: http://www.linuxfromscratch.org/faq/ Unsubscribe: See the above information page
