Re: [Lynx-dev] Missing First Letter?

2021-04-05 Thread Thomas Dickey
On Mon, Apr 05, 2021 at 09:49:35AM -0500, Tim Chase wrote:
> On 2021-04-05 09:19, Tim Chase wrote:
> > That's odd.  I get the inverse behavior from what you describe.  If
> > I use
> > 
> >   $ LANG=C lynx chime.html
> > 
> > I get the unicode placeholder character for the opening
> > fancy-double-quote and lynx displays the full "Hello", but if I do
> > 
> >   $ LANG=en_US.UTF-8 lynx chime.html
> > 
> > I get the vanishing "H".

I don't see this, but when I test locales, I usually use this:

#!/bin/sh
# $Id: with-locale,v 1.7 2015/08/16 21:20:39 tom Exp $
unset LANG
unset LC_ALL
unset LC_CTYPE
unset LESSCHARSET

LANG=$1
LC_ALL=$1
GDM_LANG=$1

export LANG
export LC_ALL
export GDM_LANG
if test $# != 0
then
shift 1
exec "$@"
fi
 
...and in a quick check, I did

with-locale C sh
$ LANG=en_US.UTF-8 lynx chime.html

I haven't seen a combination which makes that "H" vanish, though the
double-quote can be lost...

> To provide additional context, this is in an xterm on FreeBSD 12.2p4
> 
It may depend on what other locale-related environment variables you have set.
FreeBSD's manpage for setlocale says of LANG:

 LANG Sets the generic locale category for native language, local
  customs and coded character set in the absence of more
  specific locale variables.

but LC_ALL and LC_CTYPE are more specific.

On my Debian/testing, the manpage gives more details:

   If  locale  is an empty string, "", each part of the locale that should
   be modified is set according to the environment variables.  The details
   are  implementation-dependent.   For  glibc, first (regardless of cate‐
   gory), the environment variable LC_ALL is inspected, next the  environ‐
   ment variable with the same name as the category (see the table above),
   and finally the environment variable LANG.  The first existing environ‐
   ment  variable  is used.  If its value is not a valid locale specifica‐
   tion, the locale is unchanged, and setlocale() returns NULL.

> $ ident `which xterm`
> /usr/local/bin/xterm:
>  $FreeBSD: releng/12.2/lib/csu/amd64/reloc.c 339351 2018-10-13 23:52:55Z 
> kib $
>  $FreeBSD: releng/12.2/lib/csu/amd64/crt1.c 339351 2018-10-13 23:52:55Z 
> kib $
>  $FreeBSD: releng/12.2/lib/csu/common/ignore_init.c 339351 2018-10-13 
> 23:52:55Z kib $
>  $FreeBSD: releng/12.2/lib/csu/amd64/crti.S 217105 2011-01-07 16:07:51Z 
> kib $
>  $FreeBSD: releng/12.2/lib/csu/common/crtbrand.c 366954 2020-10-23 
> 00:00:52Z gjb $
>  $FreeBSD: releng/12.2/lib/csu/amd64/crtn.S 217105 2011-01-07
>  16:07:51Z kib $
> 
> (and for context, I have LANG=en_US.UTF-8 in my default environment,
> so that 2nd one example was only to be explicit about what would
> otherwise be default behavior)
> 
> -tim
> 
> 

-- 
Thomas E. Dickey 
https://invisible-island.net
ftp://ftp.invisible-island.net


signature.asc
Description: PGP signature
___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-04-05 Thread Tim Chase
On 2021-04-05 09:19, Tim Chase wrote:
> That's odd.  I get the inverse behavior from what you describe.  If
> I use
> 
>   $ LANG=C lynx chime.html
> 
> I get the unicode placeholder character for the opening
> fancy-double-quote and lynx displays the full "Hello", but if I do
> 
>   $ LANG=en_US.UTF-8 lynx chime.html
> 
> I get the vanishing "H".

To provide additional context, this is in an xterm on FreeBSD 12.2p4

$ ident `which xterm`
/usr/local/bin/xterm:
 $FreeBSD: releng/12.2/lib/csu/amd64/reloc.c 339351 2018-10-13 23:52:55Z 
kib $
 $FreeBSD: releng/12.2/lib/csu/amd64/crt1.c 339351 2018-10-13 23:52:55Z kib 
$
 $FreeBSD: releng/12.2/lib/csu/common/ignore_init.c 339351 2018-10-13 
23:52:55Z kib $
 $FreeBSD: releng/12.2/lib/csu/amd64/crti.S 217105 2011-01-07 16:07:51Z kib 
$
 $FreeBSD: releng/12.2/lib/csu/common/crtbrand.c 366954 2020-10-23 
00:00:52Z gjb $
 $FreeBSD: releng/12.2/lib/csu/amd64/crtn.S 217105 2011-01-07
 16:07:51Z kib $

(and for context, I have LANG=en_US.UTF-8 in my default environment,
so that 2nd one example was only to be explicit about what would
otherwise be default behavior)

-tim



___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-04-05 Thread Tim Chase
On 2021-03-31 19:45, Thomas Dickey wrote:
> From: "Tim Chase" 
>> $ xxd -r > chime.html << EOF
>> : 3c68 746d 6c3e 3c62 6f64 793e e280 9c48 ...H
>> 0010: 656c 6c6f 3c2f 626f 6479 3e3c 2f68 746d ello 
>> there's a UTF-8 double-quote before the "Hello" as marked by the
>> bytes 0xE2, 0x80, 0x9C. 
> 
> you'll get that behavior if your locale is set to non-UTF-8, e.g,.
> "C" (using "en_US" rather than "en_US.UTF-8" may also look like
> this, depending on the terminal)

That's odd.  I get the inverse behavior from what you describe.  If I
use

  $ LANG=C lynx chime.html

I get the unicode placeholder character for the opening
fancy-double-quote and lynx displays the full "Hello", but if I do

  $ LANG=en_US.UTF-8 lynx chime.html

I get the vanishing "H".

-tim



___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Thomas Dickey
- Original Message -
| From: "Tim Chase" 
| To: "Thorsten Glaser" 
| Cc: "Chime Hart" , "lynx-dev" 
| Sent: Wednesday, March 31, 2021 3:06:12 PM
| Subject: Re: [Lynx-dev] Missing First Letter?

| On 2021-03-31 18:36, Thorsten Glaser wrote:
|> This helps nothing without a way to reproduce this locally,
|> for example a URL in question.
| 
| The source seems to have been the text/html component of an email.
| However, here's a reproduction case:
| 
| $ xxd chime.html
| : 3c68 746d 6c3e 3c62 6f64 793e e280 9c48  ...H
| 0010: 656c 6c6f 3c2f 626f 6479 3e3c 2f68 746d  ello chime.html << EOF
| : 3c68 746d 6c3e 3c62 6f64 793e e280 9c48  ...H
| 0010: 656c 6c6f 3c2f 626f 6479 3e3c 2f68 746d  ello
http://invisible-island.net
ftp://ftp.invisible-island.net

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Tim Chase
On 2021-03-31 18:36, Thorsten Glaser wrote:
> This helps nothing without a way to reproduce this locally,
> for example a URL in question.

The source seems to have been the text/html component of an email.
However, here's a reproduction case:

$ xxd chime.html
: 3c68 746d 6c3e 3c62 6f64 793e e280 9c48  ...H
0010: 656c 6c6f 3c2f 626f 6479 3e3c 2f68 746d  ello chime.html << EOF
: 3c68 746d 6c3e 3c62 6f64 793e e280 9c48  ...H
0010: 656c 6c6f 3c2f 626f 6479 3e3c 2f68 746d  ellohttps://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Chime Hart
Well Thorsten, these are either news-letters I am signed up to or items I 
receive in Alpine, which I view in LYNX as its a smoother read.

Chime


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Thorsten Glaser
Chime Hart dixit:

> Well Thorsten, this goes beyond Politico. Here are 2lines from a story in
> Hollywood Reporter. Notice a w is missing from the first word, also what

This helps nothing without a way to reproduce this locally,
for example a URL in question.

Sorry,
//mirabilos

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Chime Hart
Hi Tim: Glad you received that sample. Also, thanks for your analysis. I also 
bounced that over to Shellworld, but reading over there, instead of  a missing 
beginning letter, there is an a beginning a word. Both sequences are anoying. 
Thanks

Chime


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Tim Chase
I received the sample Poltico email you forwarded.  Looking at the
underlying HTML, it looks like there are "fancy quotes" before the
characters that I noticed missing.  For example, I see a fancy-quote
(0x201c) followed by the display of "eter Navarro" where the
underlying HTML has the fancy-quote followed by "Peter Navarro".
When using "\" to toggle the view-source, I see the same symptoms
within

So the document at least contains the proper characters.  It looks
like Lynx is rendering them (both in regular and view-source) in such
a fashion that the fancy-quote eats the following character. Bug?

-tkc


On 2021-03-31 09:54, Chime Hart wrote:
> Hi Tim: Well, short of bouncing or forwarding my next Politico
> news-letter your way? Anyway, I am in Debian SID in Linux.
> lynx is already the newest version (2.9.0dev.6-2).
> You can just imagine how anoying it would be to cut-and-paste an
> article with these sort of inconsistancies.
> Chime
> 



___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Chime Hart
Hi Tim: Well, short of bouncing or forwarding my next Politico news-letter your 
way? Anyway, I am in Debian SID in Linux.

lynx is already the newest version (2.9.0dev.6-2).
You can just imagine how anoying it would be to cut-and-paste an article with 
these sort of inconsistancies.

Chime


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Chime Hart
Well Thorsten, this goes beyond Politico. Here are 2lines from a story in 
Hollywood Reporter. Notice a w is missing from the first word, also what 
finishes line2
hat is a big deal. It is a big deal to the city of Boston, it is a big deal 
to the United States, it is a big deal for Black and brown communities, BNC 
president and CEO Princell Hair tells The Hollywood Reporter.  hat is what

   you get on BNC that you won  get anywhere else.^J
Back again live, my 2lines in NANO became 4lines in Alpine. An only thing I can 
try would be to bounce a next Politico over to Shellworld, where I have never 
had this issue, although they are I think on 2.8.9

Chime


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Tim Chase
do you have a particular URL that demonstrates the problem?  This
sounds suspiciously like the traditional "drop cap" (which can be
done in CSS alone, maintaining the text uninterrrupted, but before it
became popular to do in CSS, folks would do by swapping out the first
real letter with a fancy image of that letter.)

I'm not sure about that carat-J aspect of things (it's a common way
to render control characters, and ^J is a newline, so it's not
uncommon.

I poked around at multiple Politico pages but didn't encounter any
issues.

It might also help to know which version of lynx you're using and in
what environment (at a console, in the Mac terminal, in Windows, in an
xterm/rxvt/urxvt/st/gnome-terminal/whatever on Linux or a BSD, etc)
which might also help track down any encoding issues.

-Tim

On 2021-03-31 09:09, Chime Hart wrote:
> Well, this seems strange? On my local machine running the latest
> Debian Lynx2.9.0 dev6. When reading sites such as Politico, many
> times a first letter of aword is missing. Many times this surrounds
> a bracketed link, but more often, its a first word of a line.
> Speakup seems to realize there is a symbol as it makes a sound as I
> arrow over it. Has any1 ever seen this? Do I need to switch to
> another charactor set? I also notice if the page has no html, then
> it is seemingly perfect. Some lines of these Politico pages end
> with a carrot j. Thanks so much in advance for any guidance or
> items I can change. Chime
> 
> ___
> Lynx-dev mailing list
> Lynx-dev@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/lynx-dev



___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Missing First Letter?

2021-03-31 Thread Thorsten Glaser
Hi Chime,

I don't know Politico, but I assume they replaced the first letter
by a picture of an intricately designed letter like in old monks'
books and forgot to degrade gracefully for text browsers. What you
describe certainly sounds like that.

bye,
//mirabilos
-- 
22:20⎜ The crazy that persists in his craziness becomes a master
22:21⎜ And the distance between the craziness and geniality is
only measured by the success 18:35⎜ "Psychotics are consistently
inconsistent. The essence of sanity is to be inconsistently inconsistent

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


[Lynx-dev] Missing First Letter?

2021-03-31 Thread Chime Hart
Well, this seems strange? On my local machine running the latest Debian 
Lynx2.9.0 dev6. When reading sites such as Politico, many times a first letter 
of aword is missing. Many times this surrounds a bracketed link, but more 
often, its a first word of a line. Speakup seems to realize there is a symbol 
as it makes a sound as I arrow over it. Has any1 ever seen this? Do I need to 
switch to another charactor set? I also notice if the page has no html, then it 
is seemingly perfect. Some lines of these Politico pages end with a carrot j. 
Thanks so much in advance for any guidance or items I can change.

Chime

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev