[jira] [Commented] (ODFTOOLKIT-458) Map the ODF XML RelaxNG schema into a GraphDB for Analysis

Svante Schubert (JIRA) Thu, 29 Jun 2017 01:51:43 -0700

    [ 
https://issues.apache.org/jira/browse/ODFTOOLKIT-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068025#comment-16068025
 ]


Svante Schubert commented on ODFTOOLKIT-458:
--------------------------------------------

I finally realised that the file of the memory dump of the Multi-Schema-Model 
is already representing a graph model.

Therefore, I know now how an easy algorithm to create the graph with a plain 
computer language as in Java would look like, but I want to learn ANTRL for 
other use cases as well.
Therefore I am reading myself now through ANTLR 4 examples. If anybody is 
already a user of ANTLR, any help is welcome ;)

But let me explain how this initial algorithm for reading the 
Multi-Schema-Validator model from file to create the (Neo4J) graph would look 
like.

This algorithm would create the graph step by step, which is line by line from 
the file (representing the internal graph model of the Multi-Schema-Validator
http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co
every line is (in our initial version) the creation of a graph node and its 
insertion to a parent (of course, only the initial first node at level 0 and 
has no parent).

Every line is the creating a graph node (for the initial test version)
Each node is being added to the graph level the integer number at the beginning 
of the line is indicating.
The example below shows the first 9 lines of the file. Therefore 9 nodes will 
be added to the graph. First, the sequence from 0 to 7 adds each node as a 
child to the node from the previous line, which is one level higher (with the 
exception of 0: CHOICE being the first node of the graph).
0: CHOICE
1: REF 'office-document',
2: ELEMENT "office:document",
3: SEQUENCE
4: REF 'office-document-attrs',
5: ATTRIBUTE "office:mimetype",
6: REF 'string',
7: DATA 'string',
4: REF 'office-document-common-attrs',

Second, we have to memorise the last parent of each level, so we can jump back 
to it.
Whenever the integer number of the follow-up line is rising again, we have to 
look-up the last parent of its level. Like in the end of our example:
7: DATA 'string',
4: REF 'office-document-common-attrs',
We will need to add the level 4 REF to the previous level 3 node, which in our 
example is the SEQUENCE node.

> Map the ODF XML RelaxNG schema into a GraphDB for Analysis
> ----------------------------------------------------------
>
>                 Key: ODFTOOLKIT-458
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-458
>             Project: ODF Toolkit
>          Issue Type: Wish
>            Reporter: Svante Schubert
>            Assignee: Svante Schubert
>
> *PROBLEM*
> The ODF XML (RelaxNG) schema is too big to easily read or be analysed by 
> humans.
> In version ODF 1.2 it has 598 elements and 1300 attributes.
> *SOLUTION*
> Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB (for 
> instance Neo4J) and do some basic analysis (sanity checks) on it.
> For instance, I am curious on query questions as:
> a) is a certain ODF element able to become nested (e.g. <text: p>)
> b) is every ODF element with an ID allowed to exist more than once  (this 
> issue occurred)
> c) what is the minimum mandatory ODF XML document 
> etc.
> These queries could help a lot to understand and test the XML schema.
> Certainly, I would love to have afterwards more tooling.
> For instance, to be able to add metadata to the nodes to categorise nodes 
> (which are meant for metadata, styles, text container, which are just plain 
> boilerplate (e.g. office:body)
> The idea is to improve the generation of ODFDOM source code to allow easier 
> maintainability.
> *DESIGN IDEA*
> Instead of reading plain RelaxNG, I thought it might be a better idea to read 
> already a 'normalised' document the dumped internal model from MSV. You may 
> find the dump for each ODF version as test references from 
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/resources/examples/odf
> e.g. 
> http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co
>  
> NOTE: 
> You may find more about the information on the dump and the MSV model in:
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/schema2template/example/odf/OdfHelper.java
> and
> <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/
> I would love to have a discussion on further thoughts of yours on the list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ODFTOOLKIT-458) Map the ODF XML RelaxNG schema into a GraphDB for Analysis

Reply via email to