Well, I configure the built as follows runConfigure -paix -cxlc -xxlC_r
So it should have used the 'native' for transcoding. I'm afraid I don't know about this to be of more help. > -----Original Message----- > From: John Lilley [mailto:[email protected]] > Sent: 13 August 2009 16:11 > To: [email protected] > Subject: RE: Invalid byte 1 (£) of a 1-byte sequence > > Could the difference be that on AIX Xerces is built to use > ICU for transcoding, and ICU does not throw an exception for > an invalid UTF8 sequence? > > If you give the AIX parser a valid UTF-8 sequence above ASCII > range does it work correctly? The UTF8 for pound sign is 0xC2 0xA3. > > john > > -----Original Message----- > From: Giulio Troccoli [mailto:[email protected]] > Sent: Thursday, August 13, 2009 8:50 AM > To: '[email protected]' > Subject: RE: Invalid byte 1 (£) of a 1-byte sequence > > Thanks John, but my XML file is not in UTF-8, as I was trying > to demonstrate with the hex dump extract (but if I'm wrong > please let me know). > > And I tell Xerces that it is so it throws the error which is > correct. All I wanted to know is why the same thing does not > happen on AIX. Instead, on AIX, Xerces parses the file and > even gets the pound sign. It seems it silently switch to > ISO-8859-1 when it gets to the pound sign seeing that it's > not in UTF-8. > > > -----Original Message----- > > From: John Lilley [mailto:[email protected]] > > Sent: 13 August 2009 15:46 > > To: [email protected] > > Subject: RE: Invalid byte 1 (£) of a 1-byte sequence > > > > We've had no trouble reading and writing UTF-8. I set up > an example > > using the british pound symbol like you have, and Xerces > 2.8 correctly > > encodes and decodes it. We do not even supply XML header as it > > defaults to UTF-8. Do you have a code snippet that produces this > > output? > > > > john > > > > -----Original Message----- > > From: Giulio Troccoli [mailto:[email protected]] > > Sent: Thursday, August 13, 2009 4:19 AM > > To: [email protected] > > Subject: Invalid byte 1 (£) of a 1-byte sequence > > > > Hello everybody. > > > > We have been using Xerces and Xalan for few years and > recently I have > > upgraded to Xerces 2.8 and Xalan 1.10. I have personally > built both on > > Windows and AIX. > > > > One of our applications produces an XML file that another > application > > processes. This second application throws the error in this email > > subject. > > > > First of all I would like to make sure I have my facts right. > > > > The XML specifies an encoding of UTF-8 with <?xml version="1.0" > > encoding="UTF-8"?> in the first line. However, I don't > think it's been > > saved in UTF-8 because if I open it as binary and go to where the £ > > is, I can see the following > > > > BEFCE0: 20 A3 30 2E 35 E8 2E 3C 2F 6C 69 6E 65 3E 0A 20 > > £0.58.</line>. > > > > I was expecting £ to be encoding with 2 bytes. Am I correct in > > assuming this? > > > > If I am correct, then the error is correct too. > > > > My question is about AIX. I don't have the error in AIX and the XML > > document is parsed correctly. Also I didn't have any > problem with the > > pound sign with the old versions of Xerces and Xalan, 2.1 and 1.4 > > respectively, but that doesn't matter now. > > > > Would any one be in a position to confirm that this is a > bug in Xerces > > 2.8 on AIX? > > > > If, of course, I change the encoding in the XML to > ISO-8859-1 it works > > on Windows too, and that's probably what we will do, as > it's the right > > thing to do. Still, I'd like to know whether there is a bug > on AIX (so > > that I can say "it's a bug" > > when they ask me "why does it work on AIX then?") > > > > Thanks > > Giulio > > > > > > Linedata Services (UK) Ltd > > Registered Office: Bishopsgate Court, 4-12 Norton Folgate, > London, E1 > > 6DB > > Registered in England and Wales No 3027851 VAT Reg No 778499447 > > > > > > > >
