[ 
https://issues.apache.org/jira/browse/DAFFODIL-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858990#comment-16858990
 ] 

Michael Beckerle commented on DAFFODIL-1444:
--------------------------------------------

Attached a spreadsheet to this bug, which shows the near exponential growth in 
number of schema component instances as a function of schema size.

This is the core of the issue in resolving this problem.

[https://github.com/apache/incubator-daffodil/pull/228] contains 
TestRefMap.scala which provides the data for this spreadsheet.

That PR has infrastructure to allow getting rid of this copying overhead, but 
all the places that depend on the copying behavior for correctness must change.

> Performance - schema compilation
> --------------------------------
>
>                 Key: DAFFODIL-1444
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1444
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Front End, Middle "End", Performance
>            Reporter: Michael Beckerle
>            Assignee: Michael Beckerle
>            Priority: Major
>             Fix For: 2.4.0
>
>         Attachments: Daffodil-exponential-component-growth.xlsx
>
>
> Large DFDL schemas are very slow to compile.
> We could focus on speeding this up, and should get some low-hanging fruit 
> here.
> But ultimately, a really large DFDL schema needs to be compiled in pieces. 
> (DEBATABLE - focus should FIRST be on speeding up and reducing the massive 
> copying that goes on. Separate compilation is a harder issue that we can 
> defer.)
> This means we need to be able to reload a compiled schema just to restore 
> it's parsers/unparsers and associated runtime data structures to memory so 
> that another schema that depends on it can then be compiled. 
> DFDL schema compilation needs to be understood in order to decompose a schema 
> into separately compilable units. THere's no point in trying to compile a 
> schema layer by layer - a DFDL schema containing all type definitions, for 
> example, doesn't compile to anything. There have to be top level elements in 
> order for DFDL schema compilation to do anything.
> So given a large data format with many top-level element types, we need the 
> compiler to recognize element references to pre-compiled top-level elements, 
> and avoid recompiling new instances of them if the surrounding environment is 
> the same. That is, surrounding default format specification is the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to