Re: The future of the daffodil DFDL schema debugger?

John Wass Thu, 08 Apr 2021 09:40:41 -0700

> lives in daffodil repo (new subproject?)

Not asking a question here, meant to snip out those parens.


The daffodil-debug-api and any daffodil-debug-io-NAME projects do represent
new subprojects.

Just wanted to clarify, never see those things till send is hit.



On Thu, Apr 8, 2021 at 12:36 PM John Wass <[email protected]> wrote:

> Revisiting this post after doing some debugger related work and thinking
> about debug protocol/adapters to connect external tooling to the debug
> process.
>
> This comment is good
>
> > This allo makes me wonder if an approach worth taking for the future of
> Daffodil schema debugging is developing a sort of "Daffodil Debug Protocol".
> I imagine it would be loosely based on DAP (which is  essentially JSON
> message based) but could be targeted to the things that a DFDL schema
> debugger would really need. An added benefit with some  sort of protocol
> is the debugger interface can be uncoupled from Daffodil itself, so we
> could implement a TUI/GUI/whatever in any  language/GUI framework and just
> have it communicate the protocol over some form of IPC. Another benefit
> is that any future backends could implement this protocol and so a single
> debugger could hook into different backends without much issue.
> Unfortunately, defining such a protocol might be a large task, but we do
> have our existing debug infrastructure and things like DAP to guide its
> development/design.
>
> Some thoughts on this
> - Defining the protocol will be a large task, but a minimal version should
> get up and round tripping quickly with a minimal subset of the protocol.
> - The new protocol being informed by existing debugger and DAPis key
> - Uncoupling from Daffodil is key
> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
> constrain Daffodil debugging capability
> - We dont need to tie the protocol or adapters to a single framework,
> implementations of the IO layer should be simple enough to support multiple
> things (eg Akka, Zio, "basic" ...)
> - The current debugger lives in runtime1, but can we make an abstract API
> that any runtime would implement?
>
> Maybe a solution is structured like this
> - daffodil-debug-api:
>   - protocol model
>   - interfaces: debugger / IO adapter / etc
>   - lives in daffodil repo (new subproject?)
> - daffodil-debug-io-NAME
>   - provides implementation of a specific IO adapter
>   - multiple projects possible (daffodil-debugger-akka,
> daffodil-debugger-zio, etc)
>   - supported ones live in their own subprojects, but other can be plugged
> in from external sources
>   - ability to support multiple implementations reduces risk of lock-in
> - debugger applications
>   - maintained in external repositories
>   - depending on the IO implementation these could execute be in separate
> process or on separate machine
>   - like Steve said, could be any language / framework
>
> Three types of reference implementations / sample applications could also
> guide the development of the API
>   1. a replacement for the existing TUI debugger, expected to end up with
> at minimum the same functionality as the current one.
>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>   3. an IDE integration
>
> Thoughts?
>
> Also I'm working on some reference implementations of these concepts using
> Akka and Zio.  Not quite ready to talk through it yet, but the code is here
> https://github.com/jw3/example-daffodil-debug
>
>
>
> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <[email protected]>
> wrote:
>
>> Yep, something like that seems very reasonable for dealing with large
>> infosets. But it still feels like we still run into usability issues.
>> For example, what if a user wants to see more? We need some
>> configuration options to increase what we've ellided. It's not big, but
>> every new thing that needs configuration adds complexity and decreases
>> usability.
>>
>> And I think the only reason we are trying to spend effort elliding
>> things is because we're limited to this gdb-like interface where you can
>> only print out a little information at a time.
>>
>> I think what would really is to dump this gdb interface and instead use
>> multiple windows/views. As a really close example to what I imagine, I
>> recently came across this hex editor:
>>
>> https://www.synalysis.net/
>>
>> The screenshots are a bit small so it's not super clear, but this tool
>> has one view for the data in hex, and one view for a tree of parsed
>> results (which is very similar to our infoset). The "infoset" view has
>> information like offset/length/value, and can be related back to the
>> data view to find the actual bits.
>>
>> I imagine the "next generation daffodil debugger" to look much like
>> this. As data is parsed, the infoset view fills up. This view could act
>> like a standard GUI tree so you could collapse sections or scroll around
>> to show just the parts you care about, and have search capabilities to
>> quickly jump around. The advantage here is you no longer really need
>> automated eliding or heuristics for what the user *might* care about.
>> You just show the whole thing and let user scroll around. As daffodil
>> parses and backtracks, this tree grows or shrinks.
>>
>> I also imagine you could have a cursor moving around the hex view, so as
>> daffodil moves around (e.g. scanning for delimiters, extracting
>> integers), one could update this data view to show what daffodil is
>> doing and where it is.
>>
>> I also image there could be other views as well. For example, a schema
>> view to show where in the schema daffodil is, and to add/remove
>> breakpoints. And an information view for things like variables, in-scope
>> delimiters, PoU's, etc.
>>
>> The only reason I mention a debug protcol is that would allow this GUI
>> to be more easily written in something other that Java/Scala to take
>> advantage of other GUI toolkits. It's been a long while since I've done
>> anything with Java guis, but they seems pretty poor that last I looked
>> at them. Would even allow for a TUI, which Java has little/no support
>> for. Also enables things like remote deubgging if an socket IPC was
>> used. Though I'm not sure all of that is necessary. Just thinking what
>> would be ideal, and it can always be pared back.
>>
>>
>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>> > I don't think of it as a daffodil debug protocol, but just a separation
>> of concerns between display of information and the behaviors of
>> parse/unparse that need to be points where users can pause, and data
>> structures available to display.
>> >
>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>> clumsy, too big, etc.  The infoset is available in the processor state, and
>> one can examine the current node, enclosing node, prior sibling(s),
>> following sibling(s), etc. One can elide contents that are too big for
>> hexBinary, etc.
>> >
>> > I think this problem, how to display the infoset with sensible limits
>> on sizing, is fairly easy to come up with some design for, that will at
>> least be (1) always fairly small (2) much more useful in more cases. It
>> won't be perfect but can be much better than what we do now.
>> >
>> > One sensible display "mode" should be that displaying the context
>> surrounding the current element (when parsing or unparsing) displays at
>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>> (settable within reason ?)
>> >
>> > Sibling and enclosing nodes would be displayed eliding their contents
>> to at most 1 line.
>> >
>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>> >
>> > ...
>> > <enclosingParent1>
>> >    ...
>> >    <priorSibling2>89ab782 ...</...>
>> >    <priorSibling1>some text is here and some more text</...>
>> >    <currentNode>value might be some big thing which needs to be elided
>> ...</...>
>> >    <followingSibling1> ... </...>
>> >    ???
>> > </enclosingParent1>
>> > ???
>> >
>> > The </...> is just an idea to reduce XML matching end-tag clutter.
>> >
>> > The ... on a line alone or where element content would appear generally
>> means 1 or more other siblings. The way the display above starts with ...
>> means that this is a relative inner nest, not starting from the absolute
>> root.
>> >
>> > The ... within simple content means that content is elided to fit on
>> one line. Always follows some text characters to differentiate from the
>> child-element context.
>> >
>> > The ??? means zero or more other siblings.
>> >
>> > I used bold italic above to point out that the current node would be
>> highlighted somehow. Probably a way to do this that doesn't require display
>> modes would be useful. E.g., a text marker like ">>>" as in:
>> >
>> >>>> <currentNode>value .... </...>
>> >
>> > might be better, particularly for a trace output being dumped to a text
>> file.
>> >
>> > I made the above example an unparser kind of example by showing a
>> following sibling that exists that is after the current node.
>> >
>> > I think the key concept is that any sibling node is displayed in a way
>> that fits on one line.
>> > E.g., even if the element name was really long, I'd suggest:
>> >
>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>> >
>> > Where the element name itself gets elided because it is too long.
>> >
>> > A thought. Note that the above presentation is shown as quasi-XML, but
>> there's nothing XML-specific about it. A JSON-friendly equivalent could be
>> done as well:
>> >
>> > enclosingParent1 = {
>> >    ...
>> >    priorSibling2 = "89ab782..."
>> >    priorSibling1 = "some text is here and some more text"
>> >    currentNode = "value might be some big thing which needs to be
>> elided ..."
>> >    followingSibling1 = { ... }
>> >    ???
>> > }
>> >
>> > That's enough for 1 email thread on this debug topic.
>> >
>> >
>> > ________________________________
>> > From: Steve Lawrence <[email protected]>
>> > Sent: Tuesday, January 5, 2021 2:26 PM
>> > To: [email protected] <[email protected]>
>> > Subject: The future of the daffodil DFDL schema debugger?
>> >
>> >
>> > Now that we're in a new year, I'd like to start a discussion about the
>> > Daffodil DFDL Schema debugger and how it might be improved to be more
>> > useful.
>> >
>> > Note that this is not the capabilities to debug Daffodil itself in
>> > something like Eclipse/IntelliJ, but the ability for Daffodil to provide
>> > enough extra information during a parse/unparse so that a schema
>> > developer can get an idea of what Daffodil is doing. This makes it
>> > easier for users (rather than developers) to determine why a schema
>> > isn't giving the expect parse/unparse result (either because of bad data
>> > or a faulty schema.
>> >
>> > The current state of the debugger is enabled by providing the --debug or
>> > --trace flags in the CLI. More information about that here:
>> >
>> > https://daffodil.apache.org/debugger/
>> >
>> > This enables a TUI and commands somewhat similar to GDB, providing thins
>> > like breakpoints, steps, displaying the current infoset, display a dump
>> > of the data, etc.
>> >
>> > Although I find this tool pretty useful, it definitely has some glaring
>> > issues.
>> >
>> > The most glaring to me is that it really isn't useful at all for
>> > debugging unparse. The data dumps only include then main outputstream,
>> > so determine things like suspensions and buffered output is impossible.
>> >
>> > Another issue is the infoset output. When outputting the infoset, the
>> > debugger currently just walks the entire thing and converts it to XML
>> > and displays the XML. For large infosets, this is excess and can make it
>> > impossible to use, even with some configurations the limit how much of
>> > that infoset is actually printed to the screen. Also things like large
>> > hex binary blobs create excessive and unusable output.
>> >
>> > Another thing I feel is missing is a schema view. Right now it's very
>> > difficult to know where in the schema Daffodil actually is.
>> >
>> > I think these issues just need some thought improvement. One could
>> > imagine a better way to stringify our unparse buffers for debug. One
>> > could image a way to receive infoset state changes so the debugger can
>> > track things like backtracks and remove infosets. One could image a way
>> > display the schema
>> >
>> > We just need a better way to stringify the current state of the unparse
>> > data including buffers, and we need a way to for the debugger to receive
>> > state change information about infoset so it can update displays rather
>> > than just constantly printing the entire infoset.
>> >
>> > However, I think another other big issue is just usability in general. I
>> > think the CLI usage is reasonable, but it's not always user friendly,
>> > and is difficult to view multiple things at the same time. I think
>> > because of this very few people even use this tool. So this this like
>> > perhaps something worth focus.
>> >
>> > My first thought to improving this usability issue would be to implement
>> > the Debug Adapter Protocol (DAP)
>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>> > which many IDE's implement. With this implemented, Daffodil could be
>> > plugged in to any IDE that supports it and essentially get debugging for
>> > free, without the need to worry about the GUI elements.
>> >
>> > I do have concerns that this just wouldn't have enough functionality
>> > that we'd really need. For example, DAP really only has ability show
>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>> > show a live view of the infoset or data. Most DAP IDE's do have a
>> > console output, so we could potentially make it so the console output is
>> > a live view of infoset/data. But I'm not even sure most DAP friendly
>> > IDE's could support this kindof console output. Does anyone have
>> > familiarity with DAP IDE's or and what kinds of console capabilities are
>> > available?
>> >
>> > I also looked into TUI libraries with the idea that we could just extend
>> > our current debugger user interface to be a bit friendlier.
>> > Unfortunately, there aren't too many Java/Scala TUI libraries and those
>> > that do exist don't have Apache friendly licenses. We also want to be
>> > careful about increase dependencies just for a debugger than many people
>> > might not use, so large graphics libraries are probably out of the
>> question.
>> >
>> > This allo makes me wonder if an approach worth taking for the future of
>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
>> > Protocol". I imagine it would be loosely based on DAP (which is
>> > essentially JSON message based) but could be targeted to the things that
>> > a DFDL schema debugger would really need. An added benefit with some
>> > sort of protocol is the debugger interface can be uncoupled from
>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
>> > language/GUI framework and just have it communicate the protocol over
>> > some form of IPC. Another benefit is that any future backends could
>> > implement this protocol and so a single debugger could hook into
>> > different backends without much issue. Unfortunately, defining such a
>> > protocol might be a large task, but we do have our existing debug
>> > infrastructure and things like DAP to guide its development/design.
>> >
>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
>> > really just need the few improvements mentioned to the existing
>> > debugger. Is that enough to make it usable? Or is an entirely different
>> > approach needed to debugging schemas?
>> >
>>
>>

Re: The future of the daffodil DFDL schema debugger?

Reply via email to