[
https://issues.apache.org/jira/browse/ODFTOOLKIT-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249684#comment-16249684
]
Svante Schubert commented on ODFTOOLKIT-458:
--------------------------------------------
I have provided an interesting new milestone for the code generation, that I
have placed for now on my github repo:
https://github.com/svanteschubert/odftoolkit/tree/code-generation
[Note: The reason not to commit it yet to trunk, is aside of our ongoing
SVN2GIT transition also some W3C schema test files from the invoice domain
(e.g. UN/CEFACT Cross Industrie Invoice and the new Italien XML invoice format
according to the new EU semantic specification regarding electronic invoice),
which likely not fit to the Apache license, but I am currently using for
testing.]
*My milestone:*
In the code generator (generator/schema2template/) our XML PuzzlePieces
expressing parts of the schema (ie. starting from one element including all
nodes down to their element descendants and stopping with them) is now
represented by a subgraph of the RelaxNG XML schema stored in the Apache
Tinkerpop Graph Database reference implementation.
Even better, I was able to dump the graph model for every puzzle piece of every
schema (also some W3C invoice schemas) the graph as GraphXML and was able to
nicely render it using the Gephi tool.
I will attach one version of the table:table to show you the advantage.
The next step to me is to simplify the graph after dumping the raw version and
removing human noise, for instance
# Removing all REF to elements, which are just named as the element (only
exchanging ':' with '_')
# Exchange the (Choice -> Epsilon) pattern with an optional attribute
# Drop the interleave node for attributes
Likely more pattern will follow during round-trip of optimizations.
> Map the ODF XML RelaxNG schema into a GraphDB for Analysis
> ----------------------------------------------------------
>
> Key: ODFTOOLKIT-458
> URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-458
> Project: ODF Toolkit
> Issue Type: Wish
> Reporter: Svante Schubert
> Assignee: Svante Schubert
> Attachments: edge.properties, odf12-graph.xml, vertex.properties
>
>
> *PROBLEM*
> The ODF XML (RelaxNG) schema is too big to easily read or be analysed by
> humans.
> In version ODF 1.2 it has 598 elements and 1300 attributes.
> *SOLUTION*
> Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB (for
> instance Neo4J) and do some basic analysis (sanity checks) on it.
> For instance, I am curious on query questions as:
> a) is a certain ODF element able to become nested (e.g. <text: p>)
> b) is every ODF element with an ID allowed to exist more than once (this
> issue occurred)
> c) what is the minimum mandatory ODF XML document
> etc.
> These queries could help a lot to understand and test the XML schema.
> Certainly, I would love to have afterwards more tooling.
> For instance, to be able to add metadata to the nodes to categorise nodes
> (which are meant for metadata, styles, text container, which are just plain
> boilerplate (e.g. office:body)
> The idea is to improve the generation of ODFDOM source code to allow easier
> maintainability.
> *DESIGN IDEA*
> Instead of reading plain RelaxNG, I thought it might be a better idea to read
> already a 'normalised' document the dumped internal model from MSV. You may
> find the dump for each ODF version as test references from
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/resources/examples/odf
> e.g.
> http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co
>
> NOTE:
> You may find more about the information on the dump and the MSV model in:
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/schema2template/example/odf/OdfHelper.java
> and
> <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/
> I would love to have a discussion on further thoughts of yours on the list.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)