mbeckerle opened a new pull request #519:
URL: https://github.com/apache/daffodil/pull/519


   DAFFODIL-1422 is a ticket about restricting the XML we accept so that we do 
not allow DOCTYPE declarations in it.
   This is a security related provision, as use of DOCTYPEs leaves XML loaders 
subject to a variety of problems such as documents exploding in size as the 
DOCTYPE declarations are expanded. 
   DOCTYPEs are an old obsolete idea, and simply disallowing them entirely is 
the best option.
   
   There are other jira tickets about disallowing resolvers and loaders from 
dereferencing URIs treating them as internet URLs. 
   
   The upshot of all this is that "loading" XML is tricky, and needs to be done 
carefully via a centralized library that provides the various options different 
loading requires, while not exposing/allowing the security vulnerabilities. 
   
   This change set does not yet include establishing a single central library 
that all XML loading goes through. 
   
   Right now this is at the point where it is apparent such a library is 
needed, because there are too many places that are invoking XML loaders for 
them to just all be "done the right way". Under maintenance this is too likely 
to drift. 
   
   A key starting point is to survey every place Daffodil does XML loading. 
These include loading of:
   
   * DFDL schemas and the include/import schema files they reference
   ** This can include DFDL schemas, but also XSD for annotation languages 
(e.g., schematron annotations)
   ** Note that this validating loader is loading a schema, but loading it not 
as a schema, but as ordinary XML. This should be validated against the schema 
for DFDL schemas. 
   * XML Infosets being unparsed
   ** (Currently not a validating loader)
   * TDML files for testing - this can in turn lead to loading of DFDL schemas
   ** Test cases can load XML Infoset files. 
   ** validation here involves validating defineSchema elements which contain 
DFDL schema. 
   * Config files
   * daffodil-propgen loads the XSD files for DFDL annotations, tunables, etc.
   * Xerces validator loads a schema, and the include/import files it references
   * Xerces validator loads XML being validated 
   
   This may not be a comprehensive list. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to