Re: The future of the daffodil DFDL schema debugger?

Beckerle, Mike Wed, 06 Jan 2021 09:45:08 -0800

I don't think of it as a daffodil debug protocol, but just a separation of 
concerns between display of information and the behaviors of parse/unparse that 
need to be points where users can pause, and data structures available to 
display.


E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too 
big, etc.  The infoset is available in the processor state, and one can examine 
the current node, enclosing node, prior sibling(s), following sibling(s), etc. 
One can elide contents that are too big for hexBinary, etc.

I think this problem, how to display the infoset with sensible limits on 
sizing, is fairly easy to come up with some design for, that will at least be 
(1) always fairly small (2) much more useful in more cases. It won't be perfect 
but can be much better than what we do now.

One sensible display "mode" should be that displaying the context surrounding 
the current element (when parsing or unparsing) displays at most N-lines. (N/2 
before, N/2 after) with a maximum length of L characters (settable within 
reason ?)

Sibling and enclosing nodes would be displayed eliding their contents to at 
most 1 line.

Here's an example of what I mean. Displaying up to M=10 lines total:

...
<enclosingParent1>
   ...
   <priorSibling2>89ab782 ...</...>
   <priorSibling1>some text is here and some more text</...>
   <currentNode>value might be some big thing which needs to be elided ...</...>
   <followingSibling1> ... </...>
   ???
</enclosingParent1>
???

The </...> is just an idea to reduce XML matching end-tag clutter.

The ... on a line alone or where element content would appear generally means 1 
or more other siblings. The way the display above starts with ... means that 
this is a relative inner nest, not starting from the absolute root.

The ... within simple content means that content is elided to fit on one line. 
Always follows some text characters to differentiate from the child-element 
context.

The ??? means zero or more other siblings.

I used bold italic above to point out that the current node would be 
highlighted somehow. Probably a way to do this that doesn't require display 
modes would be useful. E.g., a text marker like ">>>" as in:

>>> <currentNode>value .... </...>

might be better, particularly for a trace output being dumped to a text file.

I made the above example an unparser kind of example by showing a following 
sibling that exists that is after the current node.

I think the key concept is that any sibling node is displayed in a way that 
fits on one line.
E.g., even if the element name was really long, I'd suggest:

  <hereIsAnElementWithASuperLongName...>abcd ... </...>

Where the element name itself gets elided because it is too long.

A thought. Note that the above presentation is shown as quasi-XML, but there's 
nothing XML-specific about it. A JSON-friendly equivalent could be done as well:

enclosingParent1 = {
   ...
   priorSibling2 = "89ab782..."
   priorSibling1 = "some text is here and some more text"
   currentNode = "value might be some big thing which needs to be elided ..."
   followingSibling1 = { ... }
   ???
}

That's enough for 1 email thread on this debug topic.


________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Tuesday, January 5, 2021 2:26 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: The future of the daffodil DFDL schema debugger?


Now that we're in a new year, I'd like to start a discussion about the
Daffodil DFDL Schema debugger and how it might be improved to be more
useful.

Note that this is not the capabilities to debug Daffodil itself in
something like Eclipse/IntelliJ, but the ability for Daffodil to provide
enough extra information during a parse/unparse so that a schema
developer can get an idea of what Daffodil is doing. This makes it
easier for users (rather than developers) to determine why a schema
isn't giving the expect parse/unparse result (either because of bad data
or a faulty schema.

The current state of the debugger is enabled by providing the --debug or
--trace flags in the CLI. More information about that here:

https://daffodil.apache.org/debugger/

This enables a TUI and commands somewhat similar to GDB, providing thins
like breakpoints, steps, displaying the current infoset, display a dump
of the data, etc.

Although I find this tool pretty useful, it definitely has some glaring
issues.

The most glaring to me is that it really isn't useful at all for
debugging unparse. The data dumps only include then main outputstream,
so determine things like suspensions and buffered output is impossible.

Another issue is the infoset output. When outputting the infoset, the
debugger currently just walks the entire thing and converts it to XML
and displays the XML. For large infosets, this is excess and can make it
impossible to use, even with some configurations the limit how much of
that infoset is actually printed to the screen. Also things like large
hex binary blobs create excessive and unusable output.

Another thing I feel is missing is a schema view. Right now it's very
difficult to know where in the schema Daffodil actually is.

I think these issues just need some thought improvement. One could
imagine a better way to stringify our unparse buffers for debug. One
could image a way to receive infoset state changes so the debugger can
track things like backtracks and remove infosets. One could image a way
display the schema

We just need a better way to stringify the current state of the unparse
data including buffers, and we need a way to for the debugger to receive
state change information about infoset so it can update displays rather
than just constantly printing the entire infoset.

However, I think another other big issue is just usability in general. I
think the CLI usage is reasonable, but it's not always user friendly,
and is difficult to view multiple things at the same time. I think
because of this very few people even use this tool. So this this like
perhaps something worth focus.

My first thought to improving this usability issue would be to implement
the Debug Adapter Protocol (DAP)
(https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
which many IDE's implement. With this implemented, Daffodil could be
plugged in to any IDE that supports it and essentially get debugging for
free, without the need to worry about the GUI elements.

I do have concerns that this just wouldn't have enough functionality
that we'd really need. For example, DAP really only has ability show
code (Daffodil's equivalent is the DFDL schema). There isn't a way to
show a live view of the infoset or data. Most DAP IDE's do have a
console output, so we could potentially make it so the console output is
a live view of infoset/data. But I'm not even sure most DAP friendly
IDE's could support this kindof console output. Does anyone have
familiarity with DAP IDE's or and what kinds of console capabilities are
available?

I also looked into TUI libraries with the idea that we could just extend
our current debugger user interface to be a bit friendlier.
Unfortunately, there aren't too many Java/Scala TUI libraries and those
that do exist don't have Apache friendly licenses. We also want to be
careful about increase dependencies just for a debugger than many people
might not use, so large graphics libraries are probably out of the question.

This allo makes me wonder if an approach worth taking for the future of
Daffodil schema debugging is developing a sort of "Daffodil Debug
Protocol". I imagine it would be loosely based on DAP (which is
essentially JSON message based) but could be targeted to the things that
a DFDL schema debugger would really need. An added benefit with some
sort of protocol is the debugger interface can be uncoupled from
Daffodil itself, so we could implement a TUI/GUI/whatever in any
language/GUI framework and just have it communicate the protocol over
some form of IPC. Another benefit is that any future backends could
implement this protocol and so a single debugger could hook into
different backends without much issue. Unfortunately, defining such a
protocol might be a large task, but we do have our existing debug
infrastructure and things like DAP to guide its development/design.

Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
really just need the few improvements mentioned to the existing
debugger. Is that enough to make it usable? Or is an entirely different
approach needed to debugging schemas?

Re: The future of the daffodil DFDL schema debugger?

Reply via email to