mbeckerle opened a new pull request #519: URL: https://github.com/apache/daffodil/pull/519
DAFFODIL-1422 is a ticket about restricting the XML we accept so that we do not allow DOCTYPE declarations in it. This is a security related provision, as use of DOCTYPEs leaves XML loaders subject to a variety of problems such as documents exploding in size as the DOCTYPE declarations are expanded. DOCTYPEs are an old obsolete idea, and simply disallowing them entirely is the best option. There are other jira tickets about disallowing resolvers and loaders from dereferencing URIs treating them as internet URLs. The upshot of all this is that "loading" XML is tricky, and needs to be done carefully via a centralized library that provides the various options different loading requires, while not exposing/allowing the security vulnerabilities. This change set does not yet include establishing a single central library that all XML loading goes through. Right now this is at the point where it is apparent such a library is needed, because there are too many places that are invoking XML loaders for them to just all be "done the right way". Under maintenance this is too likely to drift. A key starting point is to survey every place Daffodil does XML loading. These include loading of: * DFDL schemas and the include/import schema files they reference ** This can include DFDL schemas, but also XSD for annotation languages (e.g., schematron annotations) ** Note that this validating loader is loading a schema, but loading it not as a schema, but as ordinary XML. This should be validated against the schema for DFDL schemas. * XML Infosets being unparsed ** (Currently not a validating loader) * TDML files for testing - this can in turn lead to loading of DFDL schemas ** Test cases can load XML Infoset files. ** validation here involves validating defineSchema elements which contain DFDL schema. * Config files * daffodil-propgen loads the XSD files for DFDL annotations, tunables, etc. * Xerces validator loads a schema, and the include/import files it references * Xerces validator loads XML being validated This may not be a comprehensive list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
