Thanks, Erik, some useful tools and advice. I've solved the problem:
Using the emacs hexl-find-file, I could see that the wget file was OK: 000021b0: 2d33 3638 320a 3232 3109 4163 7461 204f -3682.221.Acta O 000021c0: 7274 6f70 c3a9 6469 6361 2042 7261 7369 rtop..dica Brasi 000021d0: 6c65 6972 6109 6874 7470 3a2f 2f77 7777 leira.http://www But not from the saved from Firefox: 000021b0: 2d33 3638 320a 3232 3109 4163 7461 204f -3682.221.Acta O 000021c0: 7274 6f70 c383 c2a9 6469 6361 2042 7261 rtop....dica Bra 000021d0: 7369 6c65 6972 6109 6874 7470 3a2f 2f77 sileira.http://w I checked my default character encoding in Firefox [3.0.4: Edit-->Preferences; Content.Default Font.Advanced; Character encoding.Default Character Encoding] and it turned-out it was 'Western ISO-Latin 8859-1' (!). I changed it to 'UTF-8' and all the diacritic problems went away. So it was a client software configuration problem, not the tictocs site. I'll send tictocs an update email. But I don't understand why Firefox was ignoring the "Content-Type: text/plain; charset=utf-8" It should not be using the default charset (ISO-Latin 8859-1) for this content, as it has been told the text encoding is UTF-8... -- Thanks to all who helped (on- and off-list), Glen ------------------------------ From: Erik Hetzner <erik.hetz...@ucop.edu> Sender: Code for Libraries <CODE4LIB@LISTSERV.ND.EDU> To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Character problems with tictoc Date: Mon, 21 Dec 2009 11:24:49 -0800 Message-ID: <p-irc-exbe01l9ntdej00001...@ex.ucop.edu> At Mon, 21 Dec 2009 14:09:28 -0500, Glen Newton wrote: > > It seems that different people are seeing different things in their > respective viewers (i.e some are OK and others are like what I am > seeing). > > When I use wget and view the local file in Firefox (3.0.4, Linux Suse > 11.0) I see: > http://cuvier.cisti.nrc.ca/~gnewton/tictoc1.gif > [gif used as it is not lossy] > > The text is clearly not correct. > > The file I got with wget is: > http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt > > Is this just a question of different client software (and/or OSes) > viewing or mangling the content? When dealing with character set issues (especially the dreaded double-encoding!) I find it best to use hex editors or dumpers. If in emacs, try M-x hexl-find-file. On a Unix command line, the od or hd commands are useful. For the record: 00000000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.| 00000010 0a 44 61 74 65 3a 20 4d 6f 6e 2c 20 32 31 20 44 |.Date: Mon, 21 D| 00000020 65 63 20 32 30 30 39 20 31 39 3a 32 32 3a 33 38 |ec 2009 19:22:38| 00000030 20 47 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 | GMT..Server: Ap| 00000040 61 63 68 65 2f 32 2e 32 2e 31 33 20 28 55 6e 69 |ache/2.2.13 (Uni| 00000050 78 29 20 6d 6f 64 5f 73 73 6c 2f 32 2e 32 2e 31 |x) mod_ssl/2.2.1| 00000060 33 20 4f 70 65 6e 53 53 4c 2f 30 2e 39 2e 38 6b |3 OpenSSL/0.9.8k| 00000070 20 50 48 50 2f 35 2e 33 2e 30 20 44 41 56 2f 32 | PHP/5.3.0 DAV/2| 00000080 0d 0a 58 2d 50 6f 77 65 72 65 64 2d 42 79 3a 20 |..X-Powered-By: | 00000090 50 48 50 2f 35 2e 33 2e 30 0d 0a 43 6f 6e 74 65 |PHP/5.3.0..Conte| 000000a0 6e 74 2d 54 79 70 65 3a 20 74 65 78 74 2f 70 6c |nt-Type: text/pl| 000000b0 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 75 74 66 |ain; charset=utf| 000000c0 2d 38 0d 0a 54 72 61 6e 73 66 65 72 2d 45 6e 63 |-8..Transfer-Enc| 000000d0 6f 64 69 6e 67 3a 20 63 68 75 6e 6b 65 64 0d 0a |oding: chunked..| ... 00002230 4f 72 74 68 6f 70 61 65 64 69 63 61 09 68 74 74 |Orthopaedica.htt| 00002240 70 3a 2f 2f 69 6e 66 6f 72 6d 61 68 65 61 6c 74 |p://informahealt| 00002250 68 63 61 72 65 2e 63 6f 6d 2f 61 63 74 69 6f 6e |hcare.com/action| 00002260 2f 73 68 6f 77 46 65 65 64 3f 6a 63 3d 6f 72 74 |/showFeed?jc=ort| 00002270 26 74 79 70 65 3d 65 74 6f 63 26 66 65 65 64 3d |&type=etoc&feed=| 00002280 72 73 73 09 31 37 34 35 2d 33 36 37 34 09 31 37 |rss.1745-3674.17| 00002290 34 35 2d 33 36 38 32 0a 32 32 31 09 41 63 74 61 |45-3682.221.Acta| 000022a0 20 4f 72 74 6f 70 c3 a9 64 69 63 61 20 42 72 61 | Ortop..dica Bra| 000022b0 73 69 6c 65 69 72 61 09 68 74 74 70 3a 2f 2f 77 |sileira.http://w| ... best, Erik Hetzner ---------------------------------------------------------------------- ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 [GNUPG:] ERRSIG 081801FF01DB07E3 17 2 01 1261423489 9 [GNUPG:] NO_PUBKEY 081801FF01DB07E3