On Fri, 2010-06-04 at 08:06 +0100, Brian Butterworth wrote: > The "short form" of the headlines are destined for Ceefax - where 0x23 > is £ and 0x5F is #...
No, this is definitely something like ISO8859-1 or ISO8859-15. The byte where the pound sign should be is 0xA3, not 0x23. For example... $ curl -s http://news.bbc.co.uk/1/hi/business/10252263.stm | grep '[^£]1.1bn' | hexdump -C 00000000 09 09 09 3c 68 33 3e 3c 61 20 68 72 65 66 3d 22 |...<h3><a href="| 00000010 2f 31 2f 68 69 2f 75 6b 2f 31 30 32 33 37 32 36 |/1/hi/uk/1023726| 00000020 38 2e 73 74 6d 22 3e 4d 61 6e 20 55 74 64 20 6f |8.stm">Man Utd o| 00000030 77 6e 65 72 73 20 66 61 63 69 6e 67 20 a3 31 2e |wners facing .1.| 00000040 31 62 6e 20 64 65 62 74 3c 2f 61 3e 3c 2f 68 33 |1bn debt</a></h3| 00000050 3e 0a |>.| The same headline is shown correctly in the 'most popular' section and the RSS feed (albeit encoded as the HTML entity £ in the latter). -- dwmw2 - Sent via the backstage.bbc.co.uk discussion group. To unsubscribe, please visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html. Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/