Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-11-21 Thread olafBuddenhagen
Hi,

On Thu, Nov 13, 2008 at 10:02:32AM +0100, Gerfried Fuchs wrote:
 * [EMAIL PROTECTED] [EMAIL PROTECTED] [2008-11-13
 01:55:40 CET]:

  (There are many other problems though: Aside from the broken
  entities, screen positions are miscalculated, resulting in misplaced
  link highlights and stray characters at line ends. Also, if the
  input charset differs from the terminal charset, things won't work
  at all. All this requires proper charset support to fix, which is on
  the top of my ToDo list. However, I'm still not sure how to
  implement this, so I doubt I could do it in time for lenny, even if
  the release managers would actually accept such a late change...)
 
  I'm not sure, but shouldn't libiconv be able to help you here?

Yes, libiconv is clearly the right tool for the actual charset
conversion.

There are a lot of open questions though. At which point should the
conversion be done? How to determine the right document charset, and
turn it into something iconv understands? What do we need to adapt for
the fact that we are dealing with different charsets? How to properly do
line wrapping in view of multibyte characters? (And wide characters too,
if we want to do it really properly...) How does this interact with
characters not coming directly from the document, but rather generated
internally, like entity references, or the various helper characters
inserted in the output?

The overall amount of code required is probably not big; there are just
a lot of things to consider. (Or else I would have implemented it a long
time ago :-) )

Perhaps I should split this in two tasks: first implement only proper
handling of utf8 documents in utf8 locales, leaving actualy charset
conversion aside for now...

Anyways, I guess this kind of discussion would be more appropriate on
the netrik mailing list :-)

BTW, I integrated your patch into upstream CVS -- will probably release
it as 1.16.1 one of these days... Many thanks for looking into this :-)

-antrik-



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-11-13 Thread olafBuddenhagen
Hi,

On Thu, Oct 30, 2008 at 05:08:02PM +0100, Gerfried Fuchs wrote:

  Please take a look at these two outputs:
 
 $ export LANG=C
 $ echo 'here is a german umlaut o: ouml;' | netrik -
 here is a german umlaut o: ö
 
 versus
 
 $ export LANG=de_AT.UTF-8
 $ echo 'here is a german umlaut o: ouml;' | netrik -
 here is a german umlaut o: M-v
 
  Please notice that you can use any utf8 locale, it's just that I have
 de_AT.UTF-8 locally enabled.
 
  I did build me a local test build with the attached straight-forward
 patch and it worked for me. Please notice that the very same problem has
 affected pal already, see #499403 for a reference of the issue. After
 patching it with the attached diff and recompiling I was able to saw the
 ouml; correctly with both locales.

Note that netrik doesn't really do the right thing with your test in
either case: It won't produce the right character when actually run on
an UTF-8 terminal... (The problem is that netrik is totally unaware of
utf8, or any charsets in fact, and will always try to output the
entities as iso-8859-1 -- which is obviously wrong when using a
different locale.)

What your test case does show though is that ncurses now escapes any
non-ASCII codes when running in a UTF-8 locale, while using ncursesw
restores the old behaviour of simply passing through the extended codes.
This is an important fix indeed, as it is required to keep netrik at
least somewhat working in the common situation of viewing a UTF-8 page
on a UTF-8 terminal.

(There are many other problems though: Aside from the broken entities,
screen positions are miscalculated, resulting in misplaced link
highlights and stray characters at line ends. Also, if the input charset
differs from the terminal charset, things won't work at all. All this
requires proper charset support to fix, which is on the top of my ToDo
list. However, I'm still not sure how to implement this, so I doubt I
could do it in time for lenny, even if the release managers would
actually accept such a late change...)

-antrik-



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-11-13 Thread Gerfried Fuchs
Hi!

* [EMAIL PROTECTED] [EMAIL PROTECTED] [2008-11-13 01:55:40 CET]:
 (There are many other problems though: Aside from the broken entities,
 screen positions are miscalculated, resulting in misplaced link
 highlights and stray characters at line ends. Also, if the input charset
 differs from the terminal charset, things won't work at all. All this
 requires proper charset support to fix, which is on the top of my ToDo
 list. However, I'm still not sure how to implement this, so I doubt I
 could do it in time for lenny, even if the release managers would
 actually accept such a late change...)

 I'm not sure, but shouldn't libiconv be able to help you here? From
what I understand it's meant as charset encoding conversion library. If
the data comes from a website the server sends the charset along, but
even for local generated data it shouldn't be too hard to figure out the
charset encoding. Easiest would be what the standard claims: assuming
iso-8859-1 for non-defined charsets. Conversion to what the locale
defines (through libiconv).

 Unfortunately I don't know the netrik code closer and neither I am deep
into iconv or C business so it would take me quite a while to come up
with a sensible patch for that. I am though offering to assist as
consultant or tester or whatever you might need.

 Thanks for your response, antrik. :)
Rhonda



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-11-05 Thread Gerfried Fuchs
Hi Edelhard,

* Edelhard Becker [EMAIL PROTECTED] [2008-11-02 11:34:03 CET]:
 On Thu, Oct 30, 2008 at 05:08:02PM +0100, Gerfried Fuchs wrote:
   If you are too busy I can jump in to fix this issue for lenny and
  request freeze exception approval, if that's fine with you.
 
 yeah, no problem, go ahead with in testing and good luck!  For
 unstable i'll include your patch with my next update...

 Erm, I will upload it to unstable: Updates for testing has to go
through unstable wherever possible, and given that the package is the
same version for testing and unstable I wouldn't know why I shouldn't
upload to there.

 Thanks for your approval, will start the NMU soon.
Rhonda



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-11-05 Thread Edelhard Becker
Hi Rhonda,

On Wed, Nov 05, 2008 at 10:10:01AM +0100, Gerfried Fuchs wrote:
  Erm, I will upload it to unstable: Updates for testing has to go
 through unstable wherever possible, and given that the package is the
 same version for testing and unstable I wouldn't know why I shouldn't
 upload to there.

ah, yes, of course... I was talking about 1.16 which still is waiting 
in my Todo-queue and not in unstable yet.

Greetings and sorry for the confusion,
Edelhard
-- 
~
~
:wq


signature.asc
Description: Digital signature


Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-11-02 Thread Edelhard Becker
Hi Rhonda,

On Thu, Oct 30, 2008 at 05:08:02PM +0100, Gerfried Fuchs wrote:
  If you are too busy I can jump in to fix this issue for lenny and
 request freeze exception approval, if that's fine with you.

yeah, no problem, go ahead with in testing and good luck!  For
unstable i'll include your patch with my next update...

Thanks and greetings,
Edelhard
-- 
~
~
:wq


signature.asc
Description: Digital signature


Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-10-30 Thread Gerfried Fuchs
Package: netrik
Version: 1.15.7-2
Severity: important

Hi!

 Please take a look at these two outputs:

$ export LANG=C
$ echo 'here is a german umlaut o: ouml;' | netrik -
here is a german umlaut o: ö

versus

$ export LANG=de_AT.UTF-8
$ echo 'here is a german umlaut o: ouml;' | netrik -
here is a german umlaut o: M-v

 Please notice that you can use any utf8 locale, it's just that I have
de_AT.UTF-8 locally enabled.

 I did build me a local test build with the attached straight-forward
patch and it worked for me. Please notice that the very same problem has
affected pal already, see #499403 for a reference of the issue. After
patching it with the attached diff and recompiling I was able to saw the
ouml; correctly with both locales.

 If you are too busy I can jump in to fix this issue for lenny and
request freeze exception approval, if that's fine with you.

 Thanks in advance,
Rhonda

-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: powerpc (ppc)

Kernel: Linux 2.6.26-1-powerpc
Locale: LANG=de_AT.UTF-8, LC_CTYPE=de_AT.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages netrik depends on:
ii  libc6 2.7-15 GNU C Library: Shared libraries
ii  libncurses5   5.6+20080830-1 shared libraries for terminal hand

netrik recommends no packages.

netrik suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#504023: netrik: should get compiled against libncursesw to support utf8 environments

2008-10-30 Thread Gerfried Fuchs
* Gerfried Fuchs [EMAIL PROTECTED] [2008-10-30 17:08:02 CET]:
  I did build me a local test build with the attached straight-forward
 patch and it worked for me.

 Erm, yes, of course ...  But attached now.
Rhonda


netrik_1.15.7-2.1.interdiff.gz
Description: Binary data