On Fri, Feb 29, 2008 at 12:32:29AM +0100, Adam Borowski wrote: > On Thu, Feb 28, 2008 at 10:10:32PM +0000, brian m. carlson wrote: > > On Thu, Feb 28, 2008 at 09:30:55PM +0000, Colin Watson wrote: > > >man-db really does have some special-casing here. Trust me. It was > > >necessary at the time. There are a finite number of known aliases for > > >the very small number of locales in question, and until it becomes > > >unnecessary I will simply support those. > > Of, course, encodings for _source_ pages are those we can't get away with. > > But for all intermediate steps, I don't see any reason to not go to a > well-known encoding, do everything there and finally convert to whatever > locale is set -- and you don't even need to name the charset there. > > Special-casing _output_ locales seems quite strange to me.
/* An ugly special case is needed here. The utf8 device normally * takes ISO-8859-1 input. However, with the multibyte patch, when * recoding from CJK character sets it takes UTF-8 input instead. * This is evil, but there's not much that can be done about it * apart from waiting for groff 2.0. */ > > >(And I agree that it should go away, but can't easily just yet.) > > Could you tell us what keeps us with all the old cruft? Sanity. I am not interested in making the groff package even more incredibly difficult to update to a new upstream in the future. Official groff does not yet support proper CJK typography. Until that is in place it is not a viable replacement. (I'm also really fed up of explaining this again and again. I think I'm fairly clearly active in man-db; could you please accept that I have my reasons beyond laziness, and look up what has been said on this topic over and over again in the past?) > > Is there some way to query what character set a locale uses? If not, I > > think that man-db should default to UTF-8 (since that *is* the standard > > on Debian) and handle exceptions to that. Processing an ASCII manpage > > as UTF-8 is a no-op. And it's pretty easy to tell if something isn't > > valid UTF-8, and man-db can handle that as it normally would. > > AOL. I agree with Brian 100%. As you already added code to detect if the > source is valid UTF-8 or not, all that needs to be done is using UTF-8 > instead of ISO-8859-1 as the intermediate format. There is a lot more to it than that or upstream would be recommending that already; the version of groff we are using does not have the internal capabilities that are needed (our changes are a band-aid at best). Reading this thread may be a helpful summary: http://www.mail-archive.com/[EMAIL PROTECTED]/msg01378.html In short, I am not interested in doing this on top of our current groff package. I want to do it on top of a whole new upstream that actually has the features we need with an upstream maintainer prepared to support them (note that nobody has stepped forward to do any maintenance work on the Debian multibyte patch for years). Doing that without also forward-porting our patches for features such as kinsoku shori would introduce regressions. Forward-porting these patches hackily is incredibly difficult (I've tried). Forward-porting those patches in a way that is consistent with upstream's direction (i.e. reimplementing them) is essentially Brian's work. > I see. So, in very short term, groff would be able to output PostScript > only for limited locales. That's no regression. > > And on tty and html, which are 99.99% of uses of man, suddenly all bugs like > "man iso-8859-2", Kanji names in English manpages, regressions in KOI-8R > (#424655) or no support for Indic scripts would dissappear overnight with a > minimal patch. I would love to have these new features, but I want them on top of a sane, supportable upstream release. I am sick of the mess we have now and don't want to make it worse. I also want to actually have us contribute something useful to groff upstream beyond confused users showing up on their mailing list and having to be told that this is a weirdness of Debian's groff package. I am honestly not willing to support a backport of -K/preconv to our groff package, with all of the other Unicode support that should come along with it in order to do a good job. I also enjoy maintaining this stuff too much to resign. Therefore I must encourage you to help upstream with the last few pieces needed in order to get this all merged properly. Finally, I suspect you'll find that e.g. the specialised kerning code that's in Debian's groff for proper rendering of ASCII/EUC-JP boundaries will cause problems with generalised UTF-8 rendering unless properly forward-ported. I'm fairly sure there are more such examples; that's just the first I could find easily having been away from that particular code for a while. If you don't speak all the languages in question, you might not notice this kind of thing on casual inspection of the output. Typography involves more than just getting all the characters into the right encoding. > > >He has been working on a solution acceptable to groff upstream, which is, > > >frankly, the only way I want to go now. He has already made substantial > > >progress with character class support. > > Sounds great. And that's the way to go. Of course. But wholesale, not with temporary hacks that just make my life harder. I am still the maintainer and have to consider my ability to merge future upstream releases, which is already all but impossible; introducing yet more divergence will make it even less likely that we'll ever get to a clean upstream state. I appreciate your research into this. But please, I beg you, focus your energies on upstream. There is really not much left to do; Brian's done the heavy lifting of character class support (or most of it, anyway), and now somebody just needs to take the specialised typographic rules and make them sufficiently general for inclusion. > Likewise, I'm nearly unavailable for the next two days. I'll be able to > help later, but bear in mind that groff is not my area of expertise, and I > plan only minimal changes. I hope you will take my advice born of nearly seven years of maintaining groff in Debian. Thanks, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]