[ 
https://issues.apache.org/jira/browse/XERCESJ-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675222#action_12675222
 ] 

liaomingxue commented on XERCESJ-1359:
--------------------------------------

I have found a solution to this problem. I think xcerces should support GB2312 
encoding (Chinese).
The solution also solves another problem that xceces does not support files 
encoded in GB2312.
The solution is as below:

in 
org.apache.xerces.impl.XMLEntityManager.createReader(InputStream,String,Boolean)
 add:
                /**
                 * why not supporting GB2312?
                 * @author [email protected]
                 */
                if(encoding.equals("GB2312"))
                {
                   return new InputStreamReader(inputStream,encoding);
                }

and in org.apache.xerces.util.URI.initializePath(String, int) update:
                else if (!isPathCharacter(testChar))
                {
                  /**
                   * @author [email protected]
                   * The path part of a URI may contain characters which are 
not included in URI Spec.
                   */
                  
if(Character.isUnicodeIdentifierStart(testChar)||Character.isUnicodeIdentifierPart(testChar))
                  {
                    ++index;
                    continue;
                  }

                  if (testChar == '?' || testChar == '#')
                  {
                    break;
                  }
                  throw new MalformedURIException("Path contains invalid 
character: " + testChar);
                }

and in org.apache.xerces.util.URI.initializePath(String, int) update:
                else if (!isURICharacter(testChar))
                {
                  /**
                   * A path may contain Chinese characters,
                   * but I am not sure that the method used here is right.
                   * And I believe that there must be other parts of this file 
to be corrected.
                   * And why not use java.net.URI?
                   * By [email protected]
                   */
                  
if(Character.isUnicodeIdentifierPart(testChar)||Character.isUnicodeIdentifierStart(testChar))
                  {
                    index++;
                    continue;
                  }
                    throw new MalformedURIException(
                        "Opaque part contains invalid character: " + testChar);
                }




> DOMParser exception with an xml file which name contains Chinese characters
> ---------------------------------------------------------------------------
>
>                 Key: XERCESJ-1359
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1359
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: JAXP (javax.xml.parsers)
>    Affects Versions: 2.9.1
>         Environment: Windows in China
>            Reporter: liaomingxue
>            Priority: Minor
>
> Under the same directory, there are an xml file a.xml and a schema file 
> r.xsd. 
> With the code below, all is ok. But if change the name of the file a.xml to a 
> name containing Chinese characters (eg 中.xml), then the DOMParser issues an 
> Exception:
> java.net.MalformedURLException: unknown protocol: e
>       at java.net.URL.<init>(URL.java:586)
>       at java.net.URL.<init>(URL.java:476)
>       at java.net.URL.<init>(URL.java:425)
>       at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
>       at 
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>       at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>       at xml.DOMParserDemo.main(DOMParserDemo.java:36)
>  
> And if replace parser.parse("E:/a.xml");  with parser.parse("file:E:/中.xml"); 
>  then it gives some warnings and errors:
> [Warning] 中.xml:3:117: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Error] 中.xml:3:117: cvc-elt.1: Cannot find the declaration of element 
> 'ResourceReg'.
> [Warning] 中.xml:5:16: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:8:18: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:10:15: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:12:18: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:18:19: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:20:11: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:22:11: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:24:11: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:26:10: schema_reference.4: Failed to read schema document 
> 'r.xsd', because 1) could not find the document; 2) the document could not be 
> read; 3) the root element of the document is not <xsd:schema>.
> the code:
>    try 
>     { 
>       DOMParser parser = new DOMParser(); 
>       parser.setFeature("http://xml.org/sax/features/validation",true); 
>       
> parser.setFeature("http://apache.org/xml/features/validation/schema",true); 
>       parser.parse("E:/a.xml"); 
>       Document doc = parser.getDocument(); 
>     } 
>     catch(Exception e) 
>     { 
>       e.printStackTrace(); 
>     } 
> a.xml:
> <?xml version="1.0" encoding="gb2312"?> 
> <ResourceReg xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
> xsi:noNameSpaceSchemaLocation="r.xsd"> 
>   <ResourceFig> 
>     <ResourceKID>512 </ResourceKID> 
>     <PortAddr>192.192.192.222:1:1 </PortAddr> 
>     <ResourceSID>3 </ResourceSID> 
>   </ResourceFig> 
>   <ResourceStatus> 
>     <ZBWZ>135,26 </ZBWZ> 
>     <YXZT>1 </YXZT> 
>     <CSQB>true </CSQB> 
>     <BKF>2 </BKF> 
>   </ResourceStatus> 
> </ResourceReg> 
> r.xsd:  
> <?xml version="1.0"?> 
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";> 
> <xs:element name="ResourceReg"> 
>   <xs:complexType> 
>   <xs:all> 
>     <xs:element name="ResourceFig" type="ResourceFig" minOccurs="1" 
> maxOccurs="1" /> 
>     <xs:element name="ResourceStatus" minOccurs="1" maxOccurs="1" /> 
>   </xs:all> 
>   </xs:complexType> 
> </xs:element> 
> <xs:complexType name="ResourceFig"> 
>   <xs:all> 
>   <xs:element name="ResourceKID" type="xs:unsignedShort" /> 
>   <xs:element name="PortAddr" type="xs:token" /> 
>   <xs:element name="ResourceSID" type="xs:unsignedByte" /> 
>   </xs:all> 
> </xs:complexType> 
> </xs:schema>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to