On 2013-09-12 17:37, Tim Ruehsen wrote:
On Thursday 12 September 2013 12:59:00 Björn Mattsson wrote:
Run into a bug in wget last week.
Done some digging but can't solve it by my self.
If i tries to wget a file containing capital ÅÄÖ they gets coverted
wrongly, and åäö works fine.
I uses wget -m to backup one of my webb-sites to another machine. Have
worked like a cahrm for the last 4-5 years but a couple of week ago one
of teh files came down wrong. Thought it was a college that had uploaded
something wrong but after some digging it's wget that converts wrongly.
I have UTF-8 as charset on my machine.
If you want to test/see the problem
wget -m http://bmit.se/wget
A request to http://bmit.se/wget/ returns text/html document without
specifying the charset (AFAIR, default is iso-8859-1).
Either your Server has to tag the response as utf-8 (Content-Type: text/html;
charset=utf-8) or you have to specify utf-8 in your document header.
Ok.
But why is åäö working but not ÅÄÖ (same letters but capital)
Or you specify --remote-encoding=utf-8 when calling wget.
Didn't work
Could you give it a try, maybe with -d to see what is going on.
Done and attached the log-file.
Tim
// Björn
Script started on Thu 12 Sep 2013 09:16:23 PM CEST
]0;bmt@ronneby]0;bmt@ronneby: /tmpbmt@ronneby:/tmp$ wget -m -d
http://bmit.se/wget/
DEBUG output created by Wget 1.12 on linux-gnu.
Enqueuing http://bmit.se/wget/ at depth 0
Queue count 1, maxcount 1.
[IRI Enqueuing http://bmit.se/wget/; with None
Dequeuing http://bmit.se/wget/ at depth 0
Queue count 0, maxcount 1.
--2013-09-12 21:16:39-- http://bmit.se/wget/
Resolving bmit.se... 31.209.29.190
Caching bmit.se = 31.209.29.190
Connecting to bmit.se|31.209.29.190|:80... connected.
Created socket 3.
Releasing 0x09089948 (new refcount 1).
---request begin---
GET /wget/ HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: bmit.se
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Thu, 12 Sep 2013 19:16:39 GMT
Server: Apache/2.2.22 (Debian)
Last-Modified: Wed, 11 Sep 2013 15:24:38 GMT
ETag: ac004-78-4e61d3830b980
Accept-Ranges: bytes
Content-Length: 120
Vary: Accept-Encoding
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
---response end---
200 OK
Registered socket 3 for persistent reuse.
Length: 120 [text/html]
Saving to: bmit.se/wget/index.html
0% [
] 0 --.-K/s
100%[=]
120 --.-K/s in 0s
2013-09-12 21:16:40 (2.64 MB/s) - bmit.se/wget/index.html saved [120/120]
Loaded bmit.se/wget/index.html (size 120).
bmit.se/wget/index.html: merge(http://bmit.se/wget/;, index.html) -
http://bmit.se/wget/index.html
appending http://bmit.se/wget/index.html; to urlpos.
bmit.se/wget/index.html: merge(http://bmit.se/wget/;, test) -
http://bmit.se/wget/test
appending http://bmit.se/wget/test; to urlpos.
bmit.se/wget/index.html: merge(http://bmit.se/wget/;,
teståäöÃ\205Ã\204Ã\226) - http://bmit.se/wget/teståäöÃ\205Ã\204Ã\226
appending http://bmit.se/wget/teståäöÃ\205Ã\204Ã\226; to urlpos.
no-follow in bmit.se/wget/index.html: 0
Deciding whether to enqueue http://bmit.se/wget/index.html;.
Loading robots.txt; please ignore errors.
--2013-09-12 21:16:40-- http://bmit.se/robots.txt
Reusing existing connection to bmit.se:80.
Reusing fd 3.
---request begin---
GET /robots.txt HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: bmit.se
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 404 Not Found
Date: Thu, 12 Sep 2013 19:16:40 GMT
Server: Apache/2.2.22 (Debian)
Vary: Accept-Encoding
Content-Length: 281
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
---response end---
404 Not Found
Skipping 281 bytes of body: [!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN
htmlhead
title404 Not Found/title
/headbody
h1Not Found/h1
pThe requested URL /robots.txt was not found on this server./p
hr
addressApache/2.2.22 (Debian) Server at bmit.se Port 80/address
/body/html
] done.
2013-09-12 21:16:40 ERROR 404: Not Found.
Decided to load it.
Enqueuing http://bmit.se/wget/index.html at depth 1
Queue count 1, maxcount 1.
[IRI Enqueuing http://bmit.se/wget/index.html; with None
Deciding whether to enqueue http://bmit.se/wget/test;.
Decided to load it.
Enqueuing http://bmit.se/wget/test at depth 1
Queue count 2, maxcount 2.
[IRI Enqueuing http://bmit.se/wget/test; with None
Deciding whether to enqueue http://bmit.se/wget/teståäöÅÄÖ;.
Decided to load it.
Enqueuing http://bmit.se/wget/teståäöÃ\205Ã\204Ã\226 at depth 1
Queue count 3, maxcount