Re: RFR (JAXP): 8035469 : Xerces Update: EncodingMap does not recognise Java-style encodings Cp1141-Cp1149

David Li Sat, 01 Mar 2014 10:14:27 -0800

Joe probably knows more about this, but we did some preliminaryinvestigation summarized below.

One test that was considered was creating an XML file encoded in one ofthe formats and then seeing if the parser would process the file afterour updates were added. This looked like it requires generating sampleXML files with characters from the actual encoding, which we could notfigure out in a reasonable amount of time. It's not sufficient tospecify the encoding in the XML header (<?xml version=\"1.0\"encoding=\"CP1140\"?>, also tried IBM01140) if all the text in the fileis UTF-8, since the parser complains. It was decided that since thechanges were minor, and the original Xerces bug did not include anytests or any way of reproducing the error, we would not spend too muchtime on the issue. For reference, the IBM01140-IBM01149 encodings looklike various European languages:http://www.iana.org/assignments/character-sets/character-sets.xhtml.


- David

On 3/1/2014 1:06 AM, Alan Bateman wrote:

On 28/02/2014 22:11, David Li wrote:
Hi,
This is an update from Xerces for a fixed encoding map entry in fileEncodingMap.java. For details, please refer to:https://bugs.openjdk.java.net/browse/JDK-8035469
Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035469/webrev/
(I don't have a openjdk username yet, so Joe Wang uploaded it)
No new tests since the change is minor. There were no tests fromApache fixes.
Maybe this is a question for Joe but I wonder if it would be possibleto create a test that exercises these encodings? I realize the changeis minor but it is also subtle and this maybe be an area where weshould have better tests.
-Alan

Re: RFR (JAXP): 8035469 : Xerces Update: EncodingMap does not recognise Java-style encodings Cp1141-Cp1149

Reply via email to