Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
On Wed, Aug 15, 2012 at 11:38:38PM +0200, Vincent Lefevre wrote: retitle 673385 lynx-cur: in UTF-8 locales, lynx displays search text at the wrong column if preceded by non-ASCII (multibyte) characters thanks On 2012-08-15 16:46:03 -0400, Thomas Dickey wrote: On Wed, Aug 15, 2012 at 01:01:56PM +0200, Vincent Lefevre wrote: On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote: On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote: lynx displays search text at the wrong column if preceded by UTF-8 characters. For instance, consider: This is addressed by setting XHTML_PARSING:true in lynx.cfg I don't see why a display problem should be affected by parsing. The character-set information is given only as an xml processing instruction, which in turn is used by lynx only when XHTML_PARSING is set. But lynx got the charset information right. Otherwise it couldn't have output the ellipsis characters correctly! not exactly (I think this would be a long explanation...) Running your example (default lynx.cfg, uxterm, Debian/testing), I pressed = to see the document charset is iso-8859-1. I pressed o to see that the display charset is UTF-8, while the document charset is iso-8859-1. So far I'm not seeing any details that contradict my advice above. -- Thomas E. Dickey dic...@invisible-island.net http://invisible-island.net ftp://invisible-island.net signature.asc Description: Digital signature
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
On Wed, Aug 15, 2012 at 11:38:38PM +0200, Vincent Lefevre wrote: Actually there's *exactly* the same problem with an ASCII XHTML file (here, ASCII refers to the source): in the example, just replace the p.../p line by: p#8230;#8230;#8230; In citelynx/cite, search for foo by typing: /foo/p that's a different aspect (offhand, it should work- will investigate) -- Thomas E. Dickey dic...@invisible-island.net http://invisible-island.net ftp://invisible-island.net signature.asc Description: Digital signature
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
On 2012-08-16 05:50:20 -0400, Thomas Dickey wrote: Running your example (default lynx.cfg, uxterm, Debian/testing), I pressed = to see the document charset is iso-8859-1. This is what I get here too, but the charset is really utf-8: if I do the following change: Index: lynx-search.html === --- lynx-search.html(revision 54212) +++ lynx-search.html(working copy) @@ -1,4 +1,4 @@ -?xml version=1.0 encoding=utf-8? +?xml version=1.0 encoding=iso-8859-1? !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; what the displayed content is different (the one obtain with an iso-8859-1 encoding). This shows that the specified encoding was really taken into account. -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
On 2012-08-16 06:53:54 -0400, Thomas Dickey wrote: On Wed, Aug 15, 2012 at 11:38:38PM +0200, Vincent Lefevre wrote: Actually there's *exactly* the same problem with an ASCII XHTML file (here, ASCII refers to the source): in the example, just replace the p.../p line by: p#8230;#8230;#8230; In citelynx/cite, search for foo by typing: /foo/p that's a different aspect (offhand, it should work- will investigate) This is exactly the bug I reported, i.e. I see no differences between the use of #8230; and the use of Unicode characters directly. -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote: Package: lynx-cur Version: 2.8.8dev.12-2 Severity: normal lynx displays search text at the wrong column if preceded by UTF-8 characters. For instance, consider: This is addressed by setting XHTML_PARSING:true in lynx.cfg -- Thomas E. Dickey dic...@invisible-island.net http://invisible-island.net ftp://invisible-island.net signature.asc Description: Digital signature
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote: On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote: lynx displays search text at the wrong column if preceded by UTF-8 characters. For instance, consider: This is addressed by setting XHTML_PARSING:true in lynx.cfg I don't see why a display problem should be affected by parsing. -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
On Wed, Aug 15, 2012 at 01:01:56PM +0200, Vincent Lefevre wrote: On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote: On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote: lynx displays search text at the wrong column if preceded by UTF-8 characters. For instance, consider: This is addressed by setting XHTML_PARSING:true in lynx.cfg I don't see why a display problem should be affected by parsing. The character-set information is given only as an xml processing instruction, which in turn is used by lynx only when XHTML_PARSING is set. (There are other settings to override the defaults, but that's the most direct way). -- Thomas E. Dickey dic...@invisible-island.net http://invisible-island.net ftp://invisible-island.net signature.asc Description: Digital signature
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
retitle 673385 lynx-cur: in UTF-8 locales, lynx displays search text at the wrong column if preceded by non-ASCII (multibyte) characters thanks On 2012-08-15 16:46:03 -0400, Thomas Dickey wrote: On Wed, Aug 15, 2012 at 01:01:56PM +0200, Vincent Lefevre wrote: On 2012-08-15 05:46:53 -0400, Thomas Dickey wrote: On Fri, May 18, 2012 at 10:37:06AM +0200, Vincent Lefevre wrote: lynx displays search text at the wrong column if preceded by UTF-8 characters. For instance, consider: This is addressed by setting XHTML_PARSING:true in lynx.cfg I don't see why a display problem should be affected by parsing. The character-set information is given only as an xml processing instruction, which in turn is used by lynx only when XHTML_PARSING is set. But lynx got the charset information right. Otherwise it couldn't have output the ellipsis characters correctly! Getting incorrect charset information from a (X)HTML file can lead to incorrect characters to be displayed, but certainly not a display consistency problem as reported here. Actually there's *exactly* the same problem with an ASCII XHTML file (here, ASCII refers to the source): in the example, just replace the p.../p line by: p#8230;#8230;#8230; In citelynx/cite, search for foo by typing: /foo/p I've retitled the bug, because the UTF-8 was ambiguous. The problem is not related to the encoding used in the HTML file, but IMHO, to the internal use of UTF-8 for the output to a terminal with UTF-8 locales. I think that lynx assumes that the column is obtained by counting the number of bytes, but in UTF-8 locales, this is wrong due to multibyte characters. -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#673385: lynx-cur: lynx displays search text at the wrong column if preceded by UTF-8 characters
Package: lynx-cur Version: 2.8.8dev.12-2 Severity: normal lynx displays search text at the wrong column if preceded by UTF-8 characters. For instance, consider: ?xml version=1.0 encoding=utf-8? !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; html xmlns=http://www.w3.org/1999/xhtml; lang=en xml:lang=en head titleSearch test in lynx in a UTF-8 terminal/title /head body p……… In citelynx/cite, search for foo by typing: /foo/p /body /html (before In there are 3 ellipsis characters, but other non-ASCII characters will trigger the same problem: I suppose that lynx is confused by multibyte characters). Run lynx on this file in a UTF-8 terminal (e.g. xterm under UTF-8 locales), and search for foo by typing: /foo One gets: ……… In lynx, search for foo bfooyping: /foo foo ^^^ ^^^ where the foo over ^^^ are colored, i.e. this text has been displayed (for the colored version) at the wrong column. -- System Information: Debian Release: wheezy/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-2-amd64 (SMP w/2 CPU cores) Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages lynx-cur depends on: ii libbsd0 0.3.0-2 ii libbz2-1.01.0.6-1 ii libc6 2.13-32 ii libgcrypt11 1.5.0-3 ii libgnutls26 2.12.19-1 ii libidn11 1.24-2 ii libncursesw5 5.9-7 ii libtinfo5 5.9-7 ii zlib1g1:1.2.7.dfsg-1 Versions of packages lynx-cur recommends: ii mime-support 3.52-1 lynx-cur suggests no packages. -- debconf information: lynx-cur/defaulturl: http://www.vinc17.org/ lynx-cur/etc_lynx.cfg: -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org