On Tuesday 18 August 2015 10:55:46 Andries E. Brouwer wrote: > On Tue, Aug 18, 2015 at 10:29:40AM +0200, Tim Ruehsen wrote: > > I am going with Eli that we should use iconv. > > We know the remote encoding and the local encoding > > Do we? > How do you guess the remote encoding? > Is there any particular encoding?
Yes we do. Starting with 'wget URL', URL has the local encoding (can be overridden by -- local-encoding). Using wget -r will download documents (HTML and CSS right now) and parse them for more URLs. These documents have a well known encoding (either by default or by explicit setting via HTTP header or document settings). For broken servers, we still have --remote-encoding. > Unix filenames are sequences of bytes, they do not have a character set. The character encoding makes with what symbols these bytes (or byte sequences aka multibyte / codepoints) are displayed for you. I gave an example in my last email. Change your locale to iso-8859-1 and make a 'touch äöü'. 'ls' will show it correctly. Then change your locale to UTF-8 and now 'ls' will show garbage though your file name did not change. Tim
signature.asc
Description: This is a digitally signed message part.
