Am Samstag, 19. Dezember 2015, 14:11:20 schrieb Eli Zaretskii: > > Date: Sat, 19 Dec 2015 10:15:03 +0200 > > From: Eli Zaretskii <e...@gnu.org> > > Cc: bug-wget@gnu.org > > > > > 2. contrib/check-hard fails with > > > TESTS_ENVIRONMENT="LC_ALL=tr_TR.utf8 VALGRIND_TESTS=0" make check > > > > > > FAIL: Test-iri-forced-remote > > > > > > My son has birthday tomorrow, so I am not sure how much time I can spend > > > on > > > the weekend on this issue. Maybe Eli or you could have a look ? > > > > I cannot bootstrap the Git repo (too many prerequisites I don't have). > > Can you or someone else produce a distribution tarball out of Git that > > I could then build "as usual"? > > > > Also, can you show me the log of the failed test? Turkish locales > > have "an issue" with certain upper/lower-case characters, maybe that's > > the problem. Or maybe it's something else; looking at the log might > > give good clues. > > Tim sent me the tarball and the log off-list (thanks!). I didn't yet > try to build Wget, but just looking at the test, I guess I don't > understand its idea. It has an index.html page that's encoded in > ISO-8859-15, but Wget is invoked with --remote-encoding=iso-8859-1, > and the URLs themselves in "my %urls" are all encoded in UTF-8. How's > this supposed to work?
Regarding the wget man page, --remote-encoding just sets the *default* server encoding. This only comes into play when the HTTP header does not contain a Content-type with charset set *and* the HTML page does not contain a <meta http-equiv="Content-Type" with 'content=... charset=...'. 'index.html' in this test is correctly having a meta tag with charset=utf-8 and the URLs encoded in utf-8. > Also, I'm not following the logic of overriding Content-type by the > remote encoding: p1_fran%C3%A7ais.html states "charset=UTF-8", but > includes a link encoded in ISO-8859-1, and the test seems to expect > Wget to use the remote encoding in preference to what "charset=" says. Either the test is wrong here or the man page. I would say the man page should be correct here - it makes the most sense to me. In this case the test is wrong, also the comment. > Does the remote encoding override the encoding for the _contents_ of > the URL, not just for the URL itself? That seems to make little sense > to me: the contents and the name can legitimately be encoded > differently, I think. The filenames in %expected_downloaded_files depend on --local-encoding. Since this is not given on the command line, this test will behave differently with different settings for LC_ALL ('make check' use LC_ALL=C, contrib/check- hard will also 'make check' with turkish UTF-8 locale). To fix the test, we should use --local-encoding to some kind of UTF-8 locale (or something else, but than we have to fix the filenames regarding that locale). Regards, Tim