Hi,

On Sun, Jan 11, 2026 at 08:35:13PM +0100, Thorsten Glaser wrote:
> On Sun, 11 Jan 2026, Andreas Mohr via Lynx-dev wrote:
> 
> >- "outer-scope" MIME multi-part attribution is utf-8
> 
> You can ignore that.

For purposes of document scope, indeed.

One could argue that
a MIME attribution could be made to
extend/govern a document's encoding config state, but OTOH it *is*
the document scope proper which is b0rken -
there's no denying or arguing that.



> >- the HTML document body/content is UTF-8-based
> >  (as can be verified via
> >  iconv -f utf-8 -t utf-8 <file>)
> >- the document (the authoritative container scope unit) declares
> >  iso-8859-1 encoding for its body/content
> 
> Yes, this is a bug… in the eMail, not in lynx.

Yup indeed.
The only complaint remaining might be
that lynx perhaps is not flexible enough to
offer applying trainwreck post-mortem workaround/bending.



> Overriding the charset is not easy, you have to edit the document
> for that. (It gets even funnier if an XML PI with a charset is
> present… not.)

Woerks, somewhat unexpected.



> For the case of declared latin1, contains utf-8, you could do
> a rather evil thing of temporarily switching the display charset
> to latin1 and “Raw 8-bit” to ON. That might just work, if you
> use lynx in the C.UTF-8 locale and don’t have any nōn-ASCII UI
> strings.

Hmm I cannot quite follow the [weirdly twisted?] processing chain here
(I have to admit that I did not try it either).



> For your scenario of…
> 
> >- mailcap entry
> >   text/html; lynx -assume_charset=%{charset} -display_charset=utf-8 
> > -collapse_br_tags -dump %s; nametemplate=andi_%s.html; copiousoutput
> 
> … you could do something like…
> 
>     text/html; <%s perl -0pe 
> 's!<meta\s+http-equiv="Content-Type"\s+content="[^"]*"\s*/?>!!ig;' | lynx 
> -assume_charset=%{charset} -display_charset=utf-8 -collapse_br_tags -dump 
> -stdin; nametemplate=andi_%s.html; copiousoutput
> 
> … to automatically remove such charset declaration.

Ah, right, or possibly some sed -e 's/...' alter{c|n}ation...



For the [e]links side, I had filed the "same" issue at
"0.18.0: broken HTML file (charset declaration *wrong*) - override 
possibility??"
  https://github.com/rkd77/elinks/issues/417


> (EN: “[…]uhr.gz is a reason to install mksh on every system.”)

Oh wow indeed!

Greetings

Andreas Mohr

-- 
Klimaverschandel - weil weniger Wirtschaftskrieg einfach uncool ist.

Reply via email to