Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
Hi, On Thu, Nov 13, 2008 at 10:02:32AM +0100, Gerfried Fuchs wrote: * [EMAIL PROTECTED] [EMAIL PROTECTED] [2008-11-13 01:55:40 CET]: (There are many other problems though: Aside from the broken entities, screen positions are miscalculated, resulting in misplaced link highlights and stray characters at line ends. Also, if the input charset differs from the terminal charset, things won't work at all. All this requires proper charset support to fix, which is on the top of my ToDo list. However, I'm still not sure how to implement this, so I doubt I could do it in time for lenny, even if the release managers would actually accept such a late change...) I'm not sure, but shouldn't libiconv be able to help you here? Yes, libiconv is clearly the right tool for the actual charset conversion. There are a lot of open questions though. At which point should the conversion be done? How to determine the right document charset, and turn it into something iconv understands? What do we need to adapt for the fact that we are dealing with different charsets? How to properly do line wrapping in view of multibyte characters? (And wide characters too, if we want to do it really properly...) How does this interact with characters not coming directly from the document, but rather generated internally, like entity references, or the various helper characters inserted in the output? The overall amount of code required is probably not big; there are just a lot of things to consider. (Or else I would have implemented it a long time ago :-) ) Perhaps I should split this in two tasks: first implement only proper handling of utf8 documents in utf8 locales, leaving actualy charset conversion aside for now... Anyways, I guess this kind of discussion would be more appropriate on the netrik mailing list :-) BTW, I integrated your patch into upstream CVS -- will probably release it as 1.16.1 one of these days... Many thanks for looking into this :-) -antrik- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
Hi, On Thu, Oct 30, 2008 at 05:08:02PM +0100, Gerfried Fuchs wrote: Please take a look at these two outputs: $ export LANG=C $ echo 'here is a german umlaut o: ouml;' | netrik - here is a german umlaut o: ö versus $ export LANG=de_AT.UTF-8 $ echo 'here is a german umlaut o: ouml;' | netrik - here is a german umlaut o: M-v Please notice that you can use any utf8 locale, it's just that I have de_AT.UTF-8 locally enabled. I did build me a local test build with the attached straight-forward patch and it worked for me. Please notice that the very same problem has affected pal already, see #499403 for a reference of the issue. After patching it with the attached diff and recompiling I was able to saw the ouml; correctly with both locales. Note that netrik doesn't really do the right thing with your test in either case: It won't produce the right character when actually run on an UTF-8 terminal... (The problem is that netrik is totally unaware of utf8, or any charsets in fact, and will always try to output the entities as iso-8859-1 -- which is obviously wrong when using a different locale.) What your test case does show though is that ncurses now escapes any non-ASCII codes when running in a UTF-8 locale, while using ncursesw restores the old behaviour of simply passing through the extended codes. This is an important fix indeed, as it is required to keep netrik at least somewhat working in the common situation of viewing a UTF-8 page on a UTF-8 terminal. (There are many other problems though: Aside from the broken entities, screen positions are miscalculated, resulting in misplaced link highlights and stray characters at line ends. Also, if the input charset differs from the terminal charset, things won't work at all. All this requires proper charset support to fix, which is on the top of my ToDo list. However, I'm still not sure how to implement this, so I doubt I could do it in time for lenny, even if the release managers would actually accept such a late change...) -antrik- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
Hi! * [EMAIL PROTECTED] [EMAIL PROTECTED] [2008-11-13 01:55:40 CET]: (There are many other problems though: Aside from the broken entities, screen positions are miscalculated, resulting in misplaced link highlights and stray characters at line ends. Also, if the input charset differs from the terminal charset, things won't work at all. All this requires proper charset support to fix, which is on the top of my ToDo list. However, I'm still not sure how to implement this, so I doubt I could do it in time for lenny, even if the release managers would actually accept such a late change...) I'm not sure, but shouldn't libiconv be able to help you here? From what I understand it's meant as charset encoding conversion library. If the data comes from a website the server sends the charset along, but even for local generated data it shouldn't be too hard to figure out the charset encoding. Easiest would be what the standard claims: assuming iso-8859-1 for non-defined charsets. Conversion to what the locale defines (through libiconv). Unfortunately I don't know the netrik code closer and neither I am deep into iconv or C business so it would take me quite a while to come up with a sensible patch for that. I am though offering to assist as consultant or tester or whatever you might need. Thanks for your response, antrik. :) Rhonda -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
Hi Edelhard, * Edelhard Becker [EMAIL PROTECTED] [2008-11-02 11:34:03 CET]: On Thu, Oct 30, 2008 at 05:08:02PM +0100, Gerfried Fuchs wrote: If you are too busy I can jump in to fix this issue for lenny and request freeze exception approval, if that's fine with you. yeah, no problem, go ahead with in testing and good luck! For unstable i'll include your patch with my next update... Erm, I will upload it to unstable: Updates for testing has to go through unstable wherever possible, and given that the package is the same version for testing and unstable I wouldn't know why I shouldn't upload to there. Thanks for your approval, will start the NMU soon. Rhonda -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
Hi Rhonda, On Wed, Nov 05, 2008 at 10:10:01AM +0100, Gerfried Fuchs wrote: Erm, I will upload it to unstable: Updates for testing has to go through unstable wherever possible, and given that the package is the same version for testing and unstable I wouldn't know why I shouldn't upload to there. ah, yes, of course... I was talking about 1.16 which still is waiting in my Todo-queue and not in unstable yet. Greetings and sorry for the confusion, Edelhard -- ~ ~ :wq signature.asc Description: Digital signature
Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
Hi Rhonda, On Thu, Oct 30, 2008 at 05:08:02PM +0100, Gerfried Fuchs wrote: If you are too busy I can jump in to fix this issue for lenny and request freeze exception approval, if that's fine with you. yeah, no problem, go ahead with in testing and good luck! For unstable i'll include your patch with my next update... Thanks and greetings, Edelhard -- ~ ~ :wq signature.asc Description: Digital signature
Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
Package: netrik Version: 1.15.7-2 Severity: important Hi! Please take a look at these two outputs: $ export LANG=C $ echo 'here is a german umlaut o: ouml;' | netrik - here is a german umlaut o: ö versus $ export LANG=de_AT.UTF-8 $ echo 'here is a german umlaut o: ouml;' | netrik - here is a german umlaut o: M-v Please notice that you can use any utf8 locale, it's just that I have de_AT.UTF-8 locally enabled. I did build me a local test build with the attached straight-forward patch and it worked for me. Please notice that the very same problem has affected pal already, see #499403 for a reference of the issue. After patching it with the attached diff and recompiling I was able to saw the ouml; correctly with both locales. If you are too busy I can jump in to fix this issue for lenny and request freeze exception approval, if that's fine with you. Thanks in advance, Rhonda -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (500, 'testing') Architecture: powerpc (ppc) Kernel: Linux 2.6.26-1-powerpc Locale: LANG=de_AT.UTF-8, LC_CTYPE=de_AT.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages netrik depends on: ii libc6 2.7-15 GNU C Library: Shared libraries ii libncurses5 5.6+20080830-1 shared libraries for terminal hand netrik recommends no packages. netrik suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments
* Gerfried Fuchs [EMAIL PROTECTED] [2008-10-30 17:08:02 CET]: I did build me a local test build with the attached straight-forward patch and it worked for me. Erm, yes, of course ... But attached now. Rhonda netrik_1.15.7-2.1.interdiff.gz Description: Binary data