Am Montag, 14. Dezember 2015, 18:33:38 schrieb Eli Zaretskii: > > Date: Sun, 13 Dec 2015 20:04:31 +0100 > > From: "Andries E. Brouwer" <andries.brou...@cwi.nl> > > Cc: "Andries E. Brouwer" <andries.brou...@cwi.nl>, bug-wget@gnu.org > > > > On Sun, Dec 13, 2015 at 08:01:27PM +0200, Eli Zaretskii wrote: > > > If no one is going to pick up the gauntlet, I will sit down and do it > > > myself, although I'm terribly busy with Emacs 25.1 release. > > > > Good! > > While working on this, I bumped into 2 related issues: > > 1. The functions that call 'iconv' (in iri.c) don't make a point of > flushing the last portion of the converted URL after 'iconv' > returns successfully having converted the input string in its > entirety. IME, you need then to call 'iconv' one last time with > either the 2nd or the 3rd argument set to NULL, otherwise > sometimes the last converted character doesn't get output. In my > case, some URLs converted from CP1255 to UTF-8 lost their last > character. It sounds like no one has actually used this > conversion in iri.c, except for trivially converting UTF-8 to > itself. Is that possible/reasonable?
Possibly. Could you please give an example string ? I would like to test it on GNU/Linux, BSD and Solaris to see if the output is always the same. > 2. Wget assumes that the URL given on its command line is encoded in > the locale's encoding. This is a good assumption when the user > herself types the URL at the shell prompt, but not when the URL is > copy-pasted from a browser's address bar. In the latter case, the > URL tends to be in UTF-8 (sometimes hex-encoded). At least that's > what I get from Firefox. We don't seem to have in wget any > facilities to specify a separate (3rd) encoding for the URLs on > the command line, do we? I stumbled upon this a while ago when thinking about the design of wget2. And wget2 already has a working --input-encoding option for such cases. AFAIK, nobody asked for such an option during the last years - so I assume this to be a somewhat 'expert' or 'fancy' option, at least a low priority one. It is an optional goodie. Tim