On 9 Nov 2009, at 17:00, RPB wrote:
> > Hello, > > I am retrieving XML data from the Amazon UK api which returns XML > including a £ (GBP) sign. I found that XMLParser.parse(xmlText) will > throw an exception (com.google.gwt.xml.client.impl.DOMParseException: > Failed to parse ) unless i remove the £ signs from the XML. The £ sign is not part of the 7-bit US-ASCII character set. That means that character encoding issues become critical, if you don't want corrupted data. If your file was encoded in ISO 8859-1 (Latin 1) but you were treating it as though it was encoded in UTF-8, or some similar mismatched pair, you'd see problems of this kind - in fact, be thankful that an exception was thrown - in some cases, you'd just get silent data corruption! > > I am hoping someone can explain why this happens? It doesn't seem to > make sense to me to have to pre-process the XML by removing the £ > signs or adding CDATA sections - please let me know if there is a > better way. Take steps to preserve character encoding information at the various stages, or else find a single one that will work through all stages of the chain. UTF-8 is becoming a de-facto standard, but nevertheless not all systems support it yet... > > Thanks! > > > -- Bill Michell billmich...@gmail.com --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group. To post to this group, send email to google-web-toolkit@googlegroups.com To unsubscribe from this group, send email to google-web-toolkit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-web-toolkit?hl=en -~----------~----~----~----~------~----~------~--~---