Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-16 Thread Thomas Dickey
On Wed, Aug 15, 2012 at 11:38:38PM +0200, Vincent Lefevre wrote:
 retitle 673385 lynx-cur: in UTF-8 locales, lynx displays search text at the 
 wrong column if preceded by non-ASCII (multibyte) characters
 thanks
 
 On 2012-08-15 16:46:03 -0400, Thomas Dickey wrote:
  On Wed, Aug 15, 2012 at 01:01:56PM +0200, Vincent Lefevre wrote:
   On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote:
On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote:
 lynx displays search text at the wrong column if preceded by UTF-8
 characters. For instance, consider:

This is addressed by setting

XHTML_PARSING:true

in lynx.cfg
   
   I don't see why a display problem should be affected by parsing.
  
  The character-set information is given only as an xml processing 
  instruction,
  which in turn is used by lynx only when XHTML_PARSING is set.
 
 But lynx got the charset information right. Otherwise it couldn't
 have output the ellipsis characters correctly!

not exactly (I think this would be a long explanation...)

Running your example (default lynx.cfg, uxterm, Debian/testing),
I pressed = to see the document charset is iso-8859-1.
I pressed o to see that the display charset is UTF-8, while the document
charset is iso-8859-1.

So far I'm not seeing any details that contradict my advice above.

-- 
Thomas E. Dickey dic...@invisible-island.net
http://invisible-island.net
ftp://invisible-island.net


signature.asc
Description: Digital signature


Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-16 Thread Thomas Dickey
On Wed, Aug 15, 2012 at 11:38:38PM +0200, Vincent Lefevre wrote:
 Actually there's *exactly* the same problem with an ASCII XHTML file
 (here, ASCII refers to the source): in the example, just replace the
 p.../p line by:
 
 p#8230;#8230;#8230; In citelynx/cite, search for foo by typing: 
 /foo/p

that's a different aspect (offhand, it should work- will investigate)

-- 
Thomas E. Dickey dic...@invisible-island.net
http://invisible-island.net
ftp://invisible-island.net


signature.asc
Description: Digital signature


Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-16 Thread Vincent Lefevre
On 2012-08-16 05:50:20 -0400, Thomas Dickey wrote:
 Running your example (default lynx.cfg, uxterm, Debian/testing),
 I pressed = to see the document charset is iso-8859-1.

This is what I get here too, but the charset is really utf-8:
if I do the following change:

Index: lynx-search.html
===
--- lynx-search.html(revision 54212)
+++ lynx-search.html(working copy)
@@ -1,4 +1,4 @@
-?xml version=1.0 encoding=utf-8?
+?xml version=1.0 encoding=iso-8859-1?
 !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN
   http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
 

what the displayed content is different (the one obtain with an
iso-8859-1 encoding). This shows that the specified encoding was
really taken into account.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-16 Thread Vincent Lefevre
On 2012-08-16 06:53:54 -0400, Thomas Dickey wrote:
 On Wed, Aug 15, 2012 at 11:38:38PM +0200, Vincent Lefevre wrote:
  Actually there's *exactly* the same problem with an ASCII XHTML file
  (here, ASCII refers to the source): in the example, just replace the
  p.../p line by:
  
  p#8230;#8230;#8230; In citelynx/cite, search for foo by typing: 
  /foo/p
 
 that's a different aspect (offhand, it should work- will investigate)

This is exactly the bug I reported, i.e. I see no differences
between the use of #8230; and the use of Unicode characters
directly.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-15 Thread Thomas Dickey
On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote:
 Package: lynx-cur
 Version: 2.8.8dev.12-2
 Severity: normal
 
 lynx displays search text at the wrong column if preceded by UTF-8
 characters. For instance, consider:

This is addressed by setting

XHTML_PARSING:true

in lynx.cfg

-- 
Thomas E. Dickey dic...@invisible-island.net
http://invisible-island.net
ftp://invisible-island.net


signature.asc
Description: Digital signature


Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-15 Thread Vincent Lefevre
On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote:
 On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote:
  lynx displays search text at the wrong column if preceded by UTF-8
  characters. For instance, consider:
 
 This is addressed by setting
 
 XHTML_PARSING:true
 
 in lynx.cfg

I don't see why a display problem should be affected by parsing.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-15 Thread Thomas Dickey
On Wed, Aug 15, 2012 at 01:01:56PM +0200, Vincent Lefevre wrote:
 On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote:
  On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote:
   lynx displays search text at the wrong column if preceded by UTF-8
   characters. For instance, consider:
  
  This is addressed by setting
  
  XHTML_PARSING:true
  
  in lynx.cfg
 
 I don't see why a display problem should be affected by parsing.

The character-set information is given only as an xml processing instruction,
which in turn is used by lynx only when XHTML_PARSING is set.

(There are other settings to override the defaults, but that's the most
direct way).

-- 
Thomas E. Dickey dic...@invisible-island.net
http://invisible-island.net
ftp://invisible-island.net


signature.asc
Description: Digital signature


Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-08-15 Thread Vincent Lefevre
retitle 673385 lynx-cur: in UTF-8 locales, lynx displays search text at the 
wrong column if preceded by non-ASCII (multibyte) characters
thanks

On 2012-08-15 16:46:03 -0400, Thomas Dickey wrote:
 On Wed, Aug 15, 2012 at 01:01:56PM +0200, Vincent Lefevre wrote:
  On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote:
   On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote:
lynx displays search text at the wrong column if preceded by UTF-8
characters. For instance, consider:
   
   This is addressed by setting
   
   XHTML_PARSING:true
   
   in lynx.cfg
  
  I don't see why a display problem should be affected by parsing.
 
 The character-set information is given only as an xml processing instruction,
 which in turn is used by lynx only when XHTML_PARSING is set.

But lynx got the charset information right. Otherwise it couldn't
have output the ellipsis characters correctly!

Getting incorrect charset information from a (X)HTML file can
lead to incorrect characters to be displayed, but certainly not
a display consistency problem as reported here.

Actually there's *exactly* the same problem with an ASCII XHTML file
(here, ASCII refers to the source): in the example, just replace the
p.../p line by:

p#8230;#8230;#8230; In citelynx/cite, search for foo by typing: 
/foo/p

I've retitled the bug, because the UTF-8 was ambiguous. The problem
is not related to the encoding used in the HTML file, but IMHO, to the
internal use of UTF-8 for the output to a terminal with UTF-8 locales.
I think that lynx assumes that the column is obtained by counting the
number of bytes, but in UTF-8 locales, this is wrong due to multibyte
characters.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters

2012-05-18 Thread Vincent Lefevre
Package: lynx-cur
Version: 2.8.8dev.12-2
Severity: normal

lynx displays search text at the wrong column if preceded by UTF-8
characters. For instance, consider:

?xml version=1.0 encoding=utf-8?
!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN
  http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
html xmlns=http://www.w3.org/1999/xhtml; lang=en xml:lang=en
head
titleSearch test in lynx in a UTF-8 terminal/title
/head
body
p……… In citelynx/cite, search for foo by typing: /foo/p
/body
/html

(before In there are 3 ellipsis characters, but other non-ASCII
characters will trigger the same problem: I suppose that lynx is
confused by multibyte characters).

Run lynx on this file in a UTF-8 terminal (e.g. xterm under UTF-8
locales), and search for foo by typing: /foo

One gets:

   ……… In lynx, search for foo bfooyping: /foo   foo
  ^^^  ^^^

where the foo over ^^^ are colored, i.e. this text has been
displayed (for the colored version) at the wrong column.

-- System Information:
Debian Release: wheezy/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages lynx-cur depends on:
ii  libbsd0   0.3.0-2
ii  libbz2-1.01.0.6-1
ii  libc6 2.13-32
ii  libgcrypt11   1.5.0-3
ii  libgnutls26   2.12.19-1
ii  libidn11  1.24-2
ii  libncursesw5  5.9-7
ii  libtinfo5 5.9-7
ii  zlib1g1:1.2.7.dfsg-1

Versions of packages lynx-cur recommends:
ii  mime-support  3.52-1

lynx-cur suggests no packages.

-- debconf information:
  lynx-cur/defaulturl: http://www.vinc17.org/
  lynx-cur/etc_lynx.cfg:



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org