>

Linedata Services (UK) Ltd
Registered Office: Bishopsgate Court, 4-12 Norton Folgate, London, E1 6DB
Registered in England and Wales No 3027851    VAT Reg No 778499447

-----Original Message-----


> From: Dale Worley [mailto:[email protected]]
> Sent: 14 August 2009 15:08
> To: [email protected]
> Subject: RE: Invalid byte 1 (£) of a 1-byte sequence
>
> On Fri, 2009-08-14 at 09:29 +0100, Giulio Troccoli wrote:
> > My XML document is not in UTF-8. The pound sign is just A3,
> not C2 A3.
> >
> > But I'm telling my application that the document IS in UTF-8 (using
> > the encoding="UTF-8" option).
> >
> > Windows correctly rejects it. AIX does not.
> >
> > When you say "make sure all of your processing is in
> UTF-8", I can't
> > do that. The XML is not in UTF-8 and I can not change that (it's
> > created by a C programme and I have no idea how to do that).
>
> One question to ask is why the contents of the file say that
> it is in UTF-8, but the file is not, in fact, in UTF-8.
> Whatever is generating the file is malfunctioning badly.

Mystery solved!!

Xerces does not have a bug. The reason why on AIX it was accepting the pound 
sign without problems is that we were still using the old Xerces 2.1 (which has 
never given us this problem on any platforms). As soon as I relinked the 
application with the new Xerces we had the same behaviour than on Windows.

The file is, as I said, created by a C programme, written 7-8 years ago. Since 
the beginning the XML files had the encoding="UTF-8" at the top without being 
actually UTF-8. But because Xerces didn't say anything (I'm sure we had the 
pound sign before) we didn't realise the inconsistency. Now that we have, I am 
going to change the programme to say that the XML is in fact ISO-8859-1.

Thanks to everyone for their help.

Giulio

Reply via email to