RE: Invalid byte 1 (£) of a 1-byte sequence

Giulio Troccoli Thu, 13 Aug 2009 08:23:50 -0700

Well, I configure the built as follows

runConfigure -paix -cxlc -xxlC_r


So it should have used the 'native' for transcoding. I'm afraid I don't know 
about this to be of more help. 

> -----Original Message-----
> From: John Lilley [mailto:[email protected]] 
> Sent: 13 August 2009 16:11
> To: [email protected]
> Subject: RE: Invalid byte 1 (£) of a 1-byte sequence
> 
> Could the difference be that on AIX Xerces is built to use 
> ICU for transcoding, and ICU does not throw an exception for 
> an invalid UTF8 sequence?
> 
> If you give the AIX parser a valid UTF-8 sequence above ASCII 
> range does it work correctly?  The UTF8 for pound sign is 0xC2 0xA3.
> 
> john
> 
> -----Original Message-----
> From: Giulio Troccoli [mailto:[email protected]]
> Sent: Thursday, August 13, 2009 8:50 AM
> To: '[email protected]'
> Subject: RE: Invalid byte 1 (£) of a 1-byte sequence
> 
> Thanks John, but my XML file is not in UTF-8, as I was trying 
> to demonstrate with the hex dump extract (but if I'm wrong 
> please let me know).
> 
> And I tell Xerces that it is so it throws the error which is 
> correct. All I wanted to know is why the same thing does not 
> happen on AIX. Instead, on AIX, Xerces parses the file and 
> even gets the pound sign. It seems it silently switch to 
> ISO-8859-1 when it gets to the pound sign seeing that it's 
> not in UTF-8.
> 
> > -----Original Message-----
> > From: John Lilley [mailto:[email protected]]
> > Sent: 13 August 2009 15:46
> > To: [email protected]
> > Subject: RE: Invalid byte 1 (£) of a 1-byte sequence
> > 
> > We've had no trouble reading and writing UTF-8.  I set up 
> an example 
> > using the british pound symbol like you have, and Xerces 
> 2.8 correctly 
> > encodes and decodes it.  We do not even supply XML header as it 
> > defaults to UTF-8.  Do you have a code snippet that produces this 
> > output?
> > 
> > john
> > 
> > -----Original Message-----
> > From: Giulio Troccoli [mailto:[email protected]]
> > Sent: Thursday, August 13, 2009 4:19 AM
> > To: [email protected]
> > Subject: Invalid byte 1 (£) of a 1-byte sequence
> > 
> > Hello everybody.
> > 
> > We have been using Xerces and Xalan for few years and 
> recently I have 
> > upgraded to Xerces 2.8 and Xalan 1.10. I have personally 
> built both on 
> > Windows and AIX.
> > 
> > One of our applications produces an XML file that another 
> application 
> > processes. This second application throws the error in this email 
> > subject.
> > 
> > First of all I would like to make sure I have my facts right.
> > 
> > The XML specifies an encoding of UTF-8 with <?xml version="1.0" 
> > encoding="UTF-8"?> in the first line. However, I don't 
> think it's been 
> > saved in UTF-8 because if I open it as binary and go to where the £ 
> > is, I can see the following
> > 
> > BEFCE0: 20 A3 30 2E 35 E8 2E 3C  2F 6C 69 6E 65 3E 0A 20   
> > £0.58.</line>.
> > 
> > I was expecting £ to be encoding with 2 bytes. Am I correct in 
> > assuming this?
> > 
> > If I am correct, then the error is correct too.
> > 
> > My question is about AIX. I don't have the error in AIX and the XML 
> > document is parsed correctly. Also I didn't have any 
> problem with the 
> > pound sign with the old versions of Xerces and Xalan, 2.1 and 1.4 
> > respectively, but that doesn't matter now.
> > 
> > Would any one be in a position to confirm that this is a 
> bug in Xerces 
> > 2.8 on AIX?
> > 
> > If, of course, I change the encoding in the XML to 
> ISO-8859-1 it works 
> > on Windows too, and that's probably what we will do, as 
> it's the right 
> > thing to do. Still, I'd like to know whether there is a bug 
> on AIX (so 
> > that I can say "it's a bug"
> > when they ask me "why does it work on AIX then?")
> > 
> > Thanks
> > Giulio
> > 
> > 
> > Linedata Services (UK) Ltd
> > Registered Office: Bishopsgate Court, 4-12 Norton Folgate, 
> London, E1 
> > 6DB
> > Registered in England and Wales No 3027851    VAT Reg No 778499447
> > 
> > 
> > 
> >

RE: Invalid byte 1 (£) of a 1-byte sequence

Reply via email to