On Tue, Jun 10, 2014 at 2:58 AM, Andre Fischer <awf....@gmail.com> wrote:

> Another update of my progress.
>
> I can now create a validating parser, i.e. one that checks that a document
> conforms to the specs while it parses its content.
> At the moment the validation is restricted to complex types (as opposed to
> simple types and attributes) but I think that is the hardest part.
>
> One NFA (non-deterministic finite automaton) is created for each complex
> type and one for the top level elements.  The NFAs are then converted into
> equivalent DFAs (deterministic FAs) and finally minimized (via the Hopcroft
> algorithm).  The minimization step became necessary when I added support
> for the 'all' schema element which states that its children each occur once
> in arbitrary order. Recognizing this with an FA leads to enumerate all
> permutations of the children.  With n children there are n! permutations.
>  Luckily the 'all' element is used only once and then only for 7 children
> (7! = 5040).
>
> Here are some numbers:
> The 1st and 4th edition of the ECMA-376 specification (1st edition is what
> is used by MS Office, 4th edition is equivalent to the ISO standard) have
> 40 schema files.
> These contain 1917 complex types and 781 simple types.
> Used are 1851 complex types and 727 simple types (have to check if there
> are really unused complex types or if my optimization is broken).
>
> The non-validating parser has 1853 states and 6987 transitions.
>
> The validating parser has 129530 states and 43512 transitions after
> creating the NFAs.
> After conversion to DFAs there remain 20999 states and 73772 transitions.
> After minimization there are 6097 states and 34286 transitions.
>
> Please note that the time for parsing OOXML documents does not depend on
> the number of states or transitions.   It only depends on the length of the
> input.  The number of states and transitions only make the parser bigger.
>
> Progress and commits are tracked in issue 125035.


Thanks for the update -- very impressive!


>
>
> Best regards,
> Andre
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


-- 
-------------------------------------------------------------------------------------------------
MzK

"In the midst of winter, I found there was, within me,
 an invincible summer."
                              -- Albert Camus

Reply via email to