[jira] [Commented] (DAFFODIL-2619) Add InfosetInputter with minimal overhead

Mike Beckerle (Jira) Wed, 05 Jan 2022 09:16:34 -0800


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469444#comment-17469444
 ]


Mike Beckerle commented on DAFFODIL-2619:
-----------------------------------------

The overhead of this nextElementErd() operation does still matter.

There are two parts of the overhead of the serialized representation. One is 
say, the XML parser, e.g., like Xerces. I.e., overhead of a library we call. 

Then there is the overhead of nextElementErd(), which is our code, but is an 
overhead related to the serialization format. 

So I may have talked myself into wanting 3 different timings:

1) total elapsed time (starting from XML)

2) time spent in nextElementErd()

3) time spent in the rest of the unparser.

Note that nextElementErd is coded today as a very inefficient operation that 
constructs StepQName objects, compares them (which pattern matches on them, 
which allocates yet more objects). Plus it compares lots of strings to find 
them identical which is the highest overhead kind of string comparison, as it 
has to find every character is the same. And it uses scala Map for this, which 
may, depending on the implementation, be allocating Some(x) objects. 

I just noticed that QNameBase comparison (equals method) doesn't even optimize 
for when the two QName Base objects are "eq" as in the same exact object.

This whole nextElementErd() operation could be far more efficient if all QName 
objects were interned so that "eq" comparison was always possible. In fact the 
local names and namespace (or NS) objects could similarly be interned so that 
the retrieval of interned QName objects would also be much more efficient. 
(Mostly pointer eq operations). Java hash maps should be used instead of scala 
to avoid the overhead of allocating Option/Some objects. 

> Add InfosetInputter with minimal overhead
> -----------------------------------------
>
>                 Key: DAFFODIL-2619
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2619
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Steve Lawrence
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 3.3.0
>
>
> When unparsing, some amount of performance can be attributed to 
> parsing/traversing the infoset representation (e.g. text, Scala, JDOM) and 
> converting it to the infoset events that Daffodil requires. This is done via 
> an InfosetInputter. Because an InfosetInputter is required during unparsing 
> (unlike parse which can have a null InfosetOutputter) it is difficult to 
> determine how much overhead comes from the InfosetInputter and how much comes 
> from the actual unpasing operations.
> One potential option to get a better pictures of raw unparse speed vs 
> InfosetInputter overhead is to create a new InfosetInputter that uses a 
> pre-created array of events that are as close as possible to what Daffodil 
> expects. This way, the overhead of creating the event array can occur outside 
> any measured unparser code, and the overhead of the InfosetInputter that does 
> occur inside measured code is quite small--just the amount to index into this 
> array and return the event information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (DAFFODIL-2619) Add InfosetInputter with minimal overhead

Reply via email to