Re: Character encoding

2005-03-31 Thread Hrvoje Niksic
Wget shouldn't alter the page contents, except for converted links. Is the funny character in places which Wget should know about (e.g. URLs in links) or in the page text? Could you page a minimal excerpt from the page, before and after garbling done by Wget? Alternately, could you post a URL wher

RE: Character encoding

2005-03-31 Thread Alan Hunter
Hi, Thanks for the reply. It is the page text that is the problem. When I started to investigate it further I found that it actually only happens when the page being "wgot" is a .aspx (.net asp) file. I made 3 identical files (as below), one with .html ext, 1 with .aspx ext and one with .zzz e

Re: Character encoding

2005-03-31 Thread Hrvoje Niksic
I'm not sure what causes this problem, but I suspect it does not come from Wget doing something wrong. That Notepad opens the file correctly is indicative enough. Maybe those browsers don't understand UTF-8 (or other) encoding of Unicode when the file is opened on-disk?

Re: Character encoding

2005-04-01 Thread Georg Bauhaus
Hrvoje Niksic wrote: Maybe those browsers don't understand UTF-8 (or other) encoding of Unicode when the file is opened on-disk? Or they may not have been told the .aspx's stream's encoding? (Windows CP ?) Just speculating: The apostrophy might have been typed as an accent (acute) really, so it

RE: Character encoding

2005-04-01 Thread Alan Hunter
c svr in the next few days. -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: March 31, 2005 3:19 PM To: Alan Hunter Cc: 'wget@sunsite.dk' Subject: Re: Character encoding I'm not sure what causes this problem, but I suspect it does not come from Wget doing some

RE: Character encoding

2005-04-05 Thread Alan Hunter
om: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: March 31, 2005 3:19 PM To: Alan Hunter Cc: 'wget@sunsite.dk' Subject: Re: Character encoding I'm not sure what causes this problem, but I suspect it does not come from Wget doing something wrong. That Notepad opens the file correctly is

Re: Character encoding

2005-04-06 Thread Alain Bench
Hello Georg, On Friday, April 1, 2005 at 12:01:15 PM +0200, Georg Bauhaus wrote: > The apostrophy might have been typed as an accent (acute) really Most probably the RIGHT SINGLE QUOTATION MARK U+2019, <’>, encoded in UTF-8, then wrongly seen as being CP-1252. It would look like "’" (a ci