Bruce Perens wrote: > From: "Scott K. Ellis" <[EMAIL PROTECTED]> > > I have no problem when HTML is the provided upstream documentation source, > > and don't want to cripple my ability to read that. However, when the > > upstream source is something else, such as info/texinfo, I don't want HTML > > as well. > > Well, you're going to need a script to implement that policy. Probably > the best way to handle this is to provide a way to tell the package system > that you have deliberately removed a file, and that this file should not > be replaced. I wouldn't expect this in version 1.0 .
Bruce, Your answer sounds as if you think this is a particular problem and not a general one. I think this issue affects a considerable proportion of Debian users and therefore a more general solution should be provided. In my opinion, the current policy on documentation is inadequate because of the following very important points: *) Forces users to waste bandwidth and disk space by not allowing an easy way to select which documents to keep (and move). *) It is inconsistent: treats some source formats (man) as acceptable for binary packages and some other (texinfo) as not acceptable. I do not have any use of groff other than for viewing man pages. Groff and texinfo have equivalent functionality but they are treated differently. *) It is not flexible: The packaging system should prevent users from installing mutually conflicting packages but it should allow them to install any document at all. *) It is incomplete. The policy says that the users must be able to view any document with an HTML browser, but it does not specify a default method (i.e. a default web server that should be installed as part of the base system and a default browser, part of the base system too) If we compare what is necessary to view a man page (and nobody never complains) then we can understand that something similar might be necessary for other types of documentation. A man page is in source format. We need a compiler (groff), a caching manpage server (man) and a viewer (less). They are usually in every system by default and nobody complains that, for example, an alternative less bloated manpage compiler should be used. I suggest the following changes to the policy: [Note: I use base system to mean the basic default system, not just the packages in "base"] *) The default format should be HTML, but everything necessary to view the documents should be provided as part of the base system. This includes a default HTML viewer (lynx), which users can override by means of the update-alternatives method. It would also include a default HTML server, which should also be installed as part of the base system and changeable by the update-alternatives method (or maybe conflicting with others?). Suggestions for HTML server: boa, small and fast cern, if you want caching *) The package dwww should be marked important. It should provide on-the-fly converters (as CGI programs) for as many formats as possible. No converter should depend on a non-required package. They should be self-contained or dependent on required packages only. *) A default searching/indexing engine should be chosen. It would be marked standard, but not important. (I don't know which one is good, maybe Bruce's idea of shell+zgrep can be made into a package) *) Documents should be provided in the least processed (closest to the original source) format for which an on-the-fly converter exists. Given the choice of several formats, the most versatile one (which can more easily be converted to other formats) should be chosen. *) Until there is a better option available, dpkg should include a script to automate the process of unpacking the /usr/doc/package part of a package without installing it. This is to allow users to install documentation of packages which conflict with an installed one. Users might need to manually remove the directory /usr/doc/package when they no longer need it. Scripts to register and unregister documents with dwww should be provided in order to properly handle this case *) The project should stick to the policy and not include alternative formats or viewers by default. If we are convinced that HTML is the format, then we must show it. *) Man pages should be installed in raw format and converted to HTML on-the-fly. Since a man->html converter which does not depend on groff is possible, neither "groff" nor "man" should be installed by default. Use of the "man" program should be discouraged. (Should we still insist on having man pages for any program? For Unix compatibility maybe. But man pages is one thing and the man program is another one.) *) Texinfo sources should be installed in raw format and not in info format. "Info" should be an optional package, which on installation scans /usr/doc looking for texinfo files, compiles them and places the output in /usr/info. On deinstallation it should erase /usr/info. This means that "info" should depend on "texinfo" and would be needed for organizing info files even for emacs users. Emacs should not provide info. (Emacs could be used as info browser, though) Besides, dwww should have a hook for calling the info installer (if present) when some texinfo document is registered. If info is not installed, "info package" would invoke the regular texinfo->html on-the-fly converter. (Maybe some Perl hacking is needed to convert texinfo2html from a static compiler into an on-the-fly cgi translator, but it should not be difficult) *) Tex, sgml, groff, html and plaintext sources should be installed in raw format since they either can easily be converted to html or are somehow viewable with an HTML browser. It would be nice to have good-looking conversions, but functionality and format consistency should be the main concern. A file explaining what packages are needed to generate a specific output format from a given input format should be included in the main Debian documentation. Scripts should be provided within those packages to minimize user work if they need to generate alternative formats. Note: don't forget the ability of browsers to produce plaintext (lynx), html or PostScript (Netscape) copies of cgi-generated pages. *) Documents originally in binary format (PS, DVI, PDF, MS-WORD) for which no conversion is possible should be provided separately. A file explaining how to get the documentation (including which programs the user will need: ghostscript, xdvi, MS Word) and a brief summary of the document (developers _do_ read the documents, don't they?) should be included in the binary package (in a convertible format, of course). *) Similarly, documentation in any format which excedes a maximum size should be included as a separate package. However, an overview of what a package does and its basic functionality should be included with the package, along with a reference to the rest of the documentation. *) The README.debian file should be replaced by the index.html of the /usr/doc/package directory. The file README.debian should contain a text explaining how to use "doc package" for viewing an index of the documents about a package. Similarly, we could consider displaying that text when the user types "man package" or "info package" and man or info are not installed. (or just fall through to the html browser?) I think all this is necessary if we really want to have HTML as the default documentation format. If we chicken out of requiring a default browser and a default server plus a set of cgi converters to be base packages, then we should forget about having HTML as default and go with Chris idea of having separate packages in different formats and let users choose. -- TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to [EMAIL PROTECTED] . Trouble? e-mail to [EMAIL PROTECTED] .