(Apologies in advance if this is already answered. I did make an attempt to search archives at lists.apache.org stack overflow etc., but did not find this answered. )
We use Xerces-J to validate XML files. (XSD 1.0. We are not yet using XSD 1.1) The schemas of these files are huge. Think 300+ fairly large XSD files all included/imported together. Megabytes of XSD. In contrast the XML documents we're validating are typically small. They are a few hundred or thousand bytes of XML. But there are many of them, so we're calling Xerces to parse+validate them in a loop. We are already using the Xerces APIs in such a way that the XSD is loaded once, and the parser then called repeatedly for each input data document. I imagine that to validate XML, Xerces does something akin to "compiling" the XSD into lower-level data structures for faster use when actually parsing (and validating) the incoming XML being parsed. Question 1: Is that true? Is there much compiling/lowering of the XSD to fast parser-runtime structures? If the answer to that is yes, then my next question applies also. Question 2: Is it possible to get this "compilation" of the large XSD schema done, and then serialize the resulting java object to a file, and reload this pre-compiled thing so as not to face this compiling overhead at startup time? I have seen some discussion of serializing an XSModel of the XSD schema but that's more or less isomorphic to the XSD file objects, i.e., not really saving any "compilation" overhead. Any advice appreciated. Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com