On Fri, 2010-06-04 at 08:06 +0100, Brian Butterworth wrote:
> The "short form" of the headlines are destined for Ceefax - where 0x23
> is £ and 0x5F is #... 

No, this is definitely something like ISO8859-1 or ISO8859-15. The byte
where the pound sign should be is 0xA3, not 0x23.

For example...

$ curl -s http://news.bbc.co.uk/1/hi/business/10252263.stm | grep '[^£]1.1bn'  
| hexdump -C
00000000  09 09 09 3c 68 33 3e 3c  61 20 68 72 65 66 3d 22  |...<h3><a href="|
00000010  2f 31 2f 68 69 2f 75 6b  2f 31 30 32 33 37 32 36  |/1/hi/uk/1023726|
00000020  38 2e 73 74 6d 22 3e 4d  61 6e 20 55 74 64 20 6f  |8.stm">Man Utd o|
00000030  77 6e 65 72 73 20 66 61  63 69 6e 67 20 a3 31 2e  |wners facing .1.|
00000040  31 62 6e 20 64 65 62 74  3c 2f 61 3e 3c 2f 68 33  |1bn debt</a></h3|
00000050  3e 0a                                             |>.|

The same headline is shown correctly in the 'most popular' section and
the RSS feed (albeit encoded as the HTML entity &#163; in the latter).

-- 
dwmw2

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/

Reply via email to