Re: The future of the daffodil DFDL schema debugger?

Beckerle, Mike Wed, 26 May 2021 12:42:38 -0700

I think the point was to understand debugging in daffodil, one must understand, 
and potentially have to display, the data structures that the runtime maintains.


Furthermore, some of the actions the parser/unparser takes are universal, like 
invoking a parser. Others require finer detail than that - e.g., delimiter 
scanning certainly needs more detailed treatment from the debugger.

But first approximation is there should be some way to display, inspect, and 
potentially manipulate each piece of state.

________________________________
From: John Wass <jwa...@gmail.com>
Sent: Wednesday, May 26, 2021 2:46 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

> Some thoughts re: data format debugger
> I suggest we enumerate

Mike, are you saying there is some ground work to lay for this in Daffodil
itself, or are these things which the debugger needs to model after
existing concepts.


On Mon, May 24, 2021 at 12:48 PM Beckerle, Mike <
mbecke...@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> ________________________________
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can take a long time. I was hoping this issue
> would get resolved with this new debugging infrastructure. When the
> infoset is modified, we ideally want a way to specify via DAP that parts
> of the variable hierarchy were added/removed rather than having to send
> the entire infoset during every variable update.
>
> 3) I can imagine a feature where a user would want to select an infoset
> item and jump to the associated schema element, or query information
> about that infoset item (e.g.. what bit position did it start at, what
> was the length). We don't have this right now, but would be really nice
> to have. This suggests that we need metadata associated with each of the
> variables. Does DAP have a concept of that and do IDE's have a way to
> show it?
>
> On 4/21/21 7:52 PM, Adam Rosien wrote:
> > I've been reading up on DAP and wanted to share...
> >
> >> There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > To me, much of the current state exposed by the (Daffodil) Debugger
> > translates directly to a DAP Variable[1]. DAP Variables can be
> > nested/hierarchical, so they could (potentially) model larger data like
> the
> > infoset. I can imagine shoving all the current state into Variables as a
> > proof-of-concept.
> >
> > It also seems like the processing stack maintained by the Daffodil
> PState,
> > where each item references the relevant schema element, could translate
> to
> > the DAP StackFrame type [2]. That is, the path from the schema root to
> the
> > currently processing schema element becomes the "call stack". (Apologies
> if
> > I don't have all the Daffodil terms lined up correctly.)
> >
> > For displaying the input data and processing progress, I looked at a few
> > existing VS Code extensions that provided non-builtin views, some of
> which
> > interact with their DAP debugger code [3] [4] [5] [6].
> >
> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
> > reference, wraps Microsoft's java-debug implementation of DAP. I was
> > curious about the set of request/response and event types. Additionally,
> > the Typescript API to VS Code offers custom DAP requests and responses,
> but
> > I couldn't find the equivalent notion in the java-debug project.
> >
> > .. Adam
> >
> > [1]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> > [2]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> > non-debugger custom UI)
> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
> view)
> > [5]
> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> > (debugger + memory view,
> >
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> > )
> > [6]
> >
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> > (extension for hexdumps that could be controlled by other extensions)
> > [7] https://github.com/scalacenter/scala-debug-adapter
> > [8] https://github.com/microsoft/java-debug
> >
> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jwa...@gmail.com> wrote:
> >
> >>> Going to look deeper into how DAP might fit with Daffodil
> >>
> >> Have been looking over DAP and getting a good feeling about it. The
> >> specification [1] seems general enough that it could be applied to
> Daffodil
> >> and cover a swath of common operations (like start, stop, break,
> continue,
> >> code locations, variables, etc).
> >>
> >> There are many areas though that are unique to Daffodil that have no
> >> representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> >> different variable types, backtracking, etc) will need an extension to
> >> DAP.  This really boils down to defining these things to fit under the
> DAP
> >> BaseProtocol and enabling handling of those objects on both the front
> and
> >> back ends.
> >>
> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> >> would bring a lot of extra baggage to work around that.  Developing a
> >> Daffodil specific implementation is no small task, but feasible.  There
> are
> >> a several existing implementations on the JVM that are close and can be
> >> looked at for reference.
> >>
> >> The backend implementation would look similar to what was described in
> an
> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
> Protocol
> >> Server to enable the IO between the Daffodil process and the DAP
> clients.
> >> This implementation would now be guided by the DAP specification.
> >>
> >> With the protocol and backend extended to fit Daffodil that leaves the
> >> frontend.  In theory an existing IDE plugin should get pretty close to
> >> being able to perform the common debug operations mentioned above.  To
> >> support the Daffodil extensions there will need to be handling of the
> >> extended protocol into whatever views are desired/applicable.
> >>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>
> >> JDI appears to be the wrong level of abstraction for what we are talking
> >> about in debugging Daffodil for schema development.  While DAP does do
> JVM
> >> debugging (through a JDI DAP impl) it also generalizes to many other
> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
> >>
> >> Extending the JDI appears to be more complex than dealing with DAP, and
> >> even though the JDI API is mostly defined with interfaces, there are
> choke
> >> points that limit to JVM concepts.  For example jdi.Value has a finite
> set
> >> of JVM types that it works with, its not clear where Daffodil types
> would
> >> plugin if even possible.
> >>
> >> The final note is that unique Daffodil features wouldn’t get to IDE
> support
> >> any faster JDI.  In some cases, like VS Code, you would still need an
> >> extended DAP to support these features.
> >>
> >>> and depending on how it shakes out will update the example to show
> >> integration
> >>
> >> It would appear wise to investigate DAP further.  Next step is to refine
> >> these thoughts with a prototype. I started an implementation in the
> example
> >> debugger project [4] to try to run the current example on a _minimal_
> DAP
> >> implementation.
> >>
> >>
> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> >> [2] https://github.com/Microsoft/java-debug
> >> [3] https://github.com/scalacenter/scala-debug-adapter
> >> [4] https://github.com/jw3/example-daffodil-debug
> >>
> >>
> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jwa...@gmail.com> wrote:
> >>
> >>>> the code is here https://github.com/jw3/example-daffodil-debug
> >>>
> >>> There is now a complete console based example for Zio that demonstrates
> >>> controlling the debug flow while distributing the current state to
> three
> >>> "displays".
> >>> 1. infoset at current step
> >>> 2. diff of infoset against previous step
> >>> 3. bit position and value of data.
> >>>
> >>> These displays are very rudimentary but demonstrate the ability to
> >>> asynchronously populate multiple views while synchronously controlling
> >> the
> >>> debug loop.
> >>>
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>
> >>> Going to look deeper into how DAP might fit with Daffodil, and
> depending
> >>> on how it shakes out will update the example to show integration.
> >>>
> >>> Some interesting links to start with
> >>> - https://github.com/scalacenter/scala-debug-adapter
> >>> -
> >>>
> >>
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> >>> - https://github.com/microsoft/java-debug
> >>>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>>
> >>>
> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jwa...@gmail.com> wrote:
> >>>
> >>>> Revisiting this post after doing some debugger related work and
> thinking
> >>>> about debug protocol/adapters to connect external tooling to the debug
> >>>> process.
> >>>>
> >>>> This comment is good
> >>>>
> >>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>> essentially JSON message based) but could be targeted to the things
> >> that a
> >>>> DFDL schema debugger would really need. An added benefit with some
> >> sort of
> >>>> protocol is the debugger interface can be uncoupled from Daffodil
> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> >>>> framework and just have it communicate the protocol over some form of
> >>>> IPC. Another benefit is that any future backends could implement this
> >>>> protocol and so a single debugger could hook into different backends
> >>>> without much issue. Unfortunately, defining such a protocol might be a
> >>>> large task, but we do have our existing debug infrastructure and
> things
> >>>> like DAP to guide its development/design.
> >>>>
> >>>> Some thoughts on this
> >>>> - Defining the protocol will be a large task, but a minimal version
> >>>> should get up and round tripping quickly with a minimal subset of the
> >>>> protocol.
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>> - Uncoupling from Daffodil is key
> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not
> to
> >>>> constrain Daffodil debugging capability
> >>>> - We dont need to tie the protocol or adapters to a single framework,
> >>>> implementations of the IO layer should be simple enough to support
> >> multiple
> >>>> things (eg Akka, Zio, "basic" ...)
> >>>> - The current debugger lives in runtime1, but can we make an abstract
> >> API
> >>>> that any runtime would implement?
> >>>>
> >>>> Maybe a solution is structured like this
> >>>> - daffodil-debug-api:
> >>>>   - protocol model
> >>>>   - interfaces: debugger / IO adapter / etc
> >>>>   - lives in daffodil repo (new subproject?)
> >>>> - daffodil-debug-io-NAME
> >>>>   - provides implementation of a specific IO adapter
> >>>>   - multiple projects possible (daffodil-debugger-akka,
> >>>> daffodil-debugger-zio, etc)
> >>>>   - supported ones live in their own subprojects, but other can be
> >>>> plugged in from external sources
> >>>>   - ability to support multiple implementations reduces risk of
> lock-in
> >>>> - debugger applications
> >>>>   - maintained in external repositories
> >>>>   - depending on the IO implementation these could execute be in
> >> separate
> >>>> process or on separate machine
> >>>>   - like Steve said, could be any language / framework
> >>>>
> >>>> Three types of reference implementations / sample applications could
> >> also
> >>>> guide the development of the API
> >>>>   1. a replacement for the existing TUI debugger, expected to end up
> >> with
> >>>> at minimum the same functionality as the current one.
> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >>>>   3. an IDE integration
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Also I'm working on some reference implementations of these concepts
> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
> >> code
> >>>> is here https://github.com/jw3/example-daffodil-debug
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <slawre...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Yep, something like that seems very reasonable for dealing with large
> >>>>> infosets. But it still feels like we still run into usability issues.
> >>>>> For example, what if a user wants to see more? We need some
> >>>>> configuration options to increase what we've ellided. It's not big,
> but
> >>>>> every new thing that needs configuration adds complexity and
> decreases
> >>>>> usability.
> >>>>>
> >>>>> And I think the only reason we are trying to spend effort elliding
> >>>>> things is because we're limited to this gdb-like interface where you
> >> can
> >>>>> only print out a little information at a time.
> >>>>>
> >>>>> I think what would really is to dump this gdb interface and instead
> use
> >>>>> multiple windows/views. As a really close example to what I imagine,
> I
> >>>>> recently came across this hex editor:
> >>>>>
> >>>>> https://www.synalysis.net/
> >>>>>
> >>>>> The screenshots are a bit small so it's not super clear, but this
> tool
> >>>>> has one view for the data in hex, and one view for a tree of parsed
> >>>>> results (which is very similar to our infoset). The "infoset" view
> has
> >>>>> information like offset/length/value, and can be related back to the
> >>>>> data view to find the actual bits.
> >>>>>
> >>>>> I imagine the "next generation daffodil debugger" to look much like
> >>>>> this. As data is parsed, the infoset view fills up. This view could
> act
> >>>>> like a standard GUI tree so you could collapse sections or scroll
> >> around
> >>>>> to show just the parts you care about, and have search capabilities
> to
> >>>>> quickly jump around. The advantage here is you no longer really need
> >>>>> automated eliding or heuristics for what the user *might* care about.
> >>>>> You just show the whole thing and let user scroll around. As daffodil
> >>>>> parses and backtracks, this tree grows or shrinks.
> >>>>>
> >>>>> I also imagine you could have a cursor moving around the hex view, so
> >> as
> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
> >>>>> integers), one could update this data view to show what daffodil is
> >>>>> doing and where it is.
> >>>>>
> >>>>> I also image there could be other views as well. For example, a
> schema
> >>>>> view to show where in the schema daffodil is, and to add/remove
> >>>>> breakpoints. And an information view for things like variables,
> >> in-scope
> >>>>> delimiters, PoU's, etc.
> >>>>>
> >>>>> The only reason I mention a debug protcol is that would allow this
> GUI
> >>>>> to be more easily written in something other that Java/Scala to take
> >>>>> advantage of other GUI toolkits. It's been a long while since I've
> done
> >>>>> anything with Java guis, but they seems pretty poor that last I
> looked
> >>>>> at them. Would even allow for a TUI, which Java has little/no support
> >>>>> for. Also enables things like remote deubgging if an socket IPC was
> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
> what
> >>>>> would be ideal, and it can always be pared back.
> >>>>>
> >>>>>
> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >>>>>> I don't think of it as a daffodil debug protocol, but just a
> >>>>> separation of concerns between display of information and the
> >> behaviors of
> >>>>> parse/unparse that need to be points where users can pause, and data
> >>>>> structures available to display.
> >>>>>>
> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
> >>>>> clumsy, too big, etc.  The infoset is available in the processor
> >> state, and
> >>>>> one can examine the current node, enclosing node, prior sibling(s),
> >>>>> following sibling(s), etc. One can elide contents that are too big
> for
> >>>>> hexBinary, etc.
> >>>>>>
> >>>>>> I think this problem, how to display the infoset with sensible
> limits
> >>>>> on sizing, is fairly easy to come up with some design for, that will
> at
> >>>>> least be (1) always fairly small (2) much more useful in more cases.
> It
> >>>>> won't be perfect but can be much better than what we do now.
> >>>>>>
> >>>>>> One sensible display "mode" should be that displaying the context
> >>>>> surrounding the current element (when parsing or unparsing) displays
> at
> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> >> characters
> >>>>> (settable within reason ?)
> >>>>>>
> >>>>>> Sibling and enclosing nodes would be displayed eliding their
> contents
> >>>>> to at most 1 line.
> >>>>>>
> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
> >>>>>>
> >>>>>> ...
> >>>>>> <enclosingParent1>
> >>>>>>    ...
> >>>>>>    <priorSibling2>89ab782 ...</...>
> >>>>>>    <priorSibling1>some text is here and some more text</...>
> >>>>>>    <currentNode>value might be some big thing which needs to be
> >> elided
> >>>>> ...</...>
> >>>>>>    <followingSibling1> ... </...>
> >>>>>>    ???
> >>>>>> </enclosingParent1>
> >>>>>> ???
> >>>>>>
> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
> >>>>>>
> >>>>>> The ... on a line alone or where element content would appear
> >>>>> generally means 1 or more other siblings. The way the display above
> >> starts
> >>>>> with ... means that this is a relative inner nest, not starting from
> >> the
> >>>>> absolute root.
> >>>>>>
> >>>>>> The ... within simple content means that content is elided to fit on
> >>>>> one line. Always follows some text characters to differentiate from
> the
> >>>>> child-element context.
> >>>>>>
> >>>>>> The ??? means zero or more other siblings.
> >>>>>>
> >>>>>> I used bold italic above to point out that the current node would be
> >>>>> highlighted somehow. Probably a way to do this that doesn't require
> >> display
> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
> >>>>>>
> >>>>>>>>> <currentNode>value .... </...>
> >>>>>>
> >>>>>> might be better, particularly for a trace output being dumped to a
> >>>>> text file.
> >>>>>>
> >>>>>> I made the above example an unparser kind of example by showing a
> >>>>> following sibling that exists that is after the current node.
> >>>>>>
> >>>>>> I think the key concept is that any sibling node is displayed in a
> >> way
> >>>>> that fits on one line.
> >>>>>> E.g., even if the element name was really long, I'd suggest:
> >>>>>>
> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >>>>>>
> >>>>>> Where the element name itself gets elided because it is too long.
> >>>>>>
> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
> >> but
> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
> >> could be
> >>>>> done as well:
> >>>>>>
> >>>>>> enclosingParent1 = {
> >>>>>>    ...
> >>>>>>    priorSibling2 = "89ab782..."
> >>>>>>    priorSibling1 = "some text is here and some more text"
> >>>>>>    currentNode = "value might be some big thing which needs to be
> >>>>> elided ..."
> >>>>>>    followingSibling1 = { ... }
> >>>>>>    ???
> >>>>>> }
> >>>>>>
> >>>>>> That's enough for 1 email thread on this debug topic.
> >>>>>>
> >>>>>>
> >>>>>> ________________________________
> >>>>>> From: Steve Lawrence <slawre...@apache.org>
> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
> >>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
> >>>>>>
> >>>>>>
> >>>>>> Now that we're in a new year, I'd like to start a discussion about
> >> the
> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
> more
> >>>>>> useful.
> >>>>>>
> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
> >>>>> provide
> >>>>>> enough extra information during a parse/unparse so that a schema
> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
> >>>>>> easier for users (rather than developers) to determine why a schema
> >>>>>> isn't giving the expect parse/unparse result (either because of bad
> >>>>> data
> >>>>>> or a faulty schema.
> >>>>>>
> >>>>>> The current state of the debugger is enabled by providing the
> --debug
> >>>>> or
> >>>>>> --trace flags in the CLI. More information about that here:
> >>>>>>
> >>>>>> https://daffodil.apache.org/debugger/
> >>>>>>
> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
> >>>>> thins
> >>>>>> like breakpoints, steps, displaying the current infoset, display a
> >> dump
> >>>>>> of the data, etc.
> >>>>>>
> >>>>>> Although I find this tool pretty useful, it definitely has some
> >> glaring
> >>>>>> issues.
> >>>>>>
> >>>>>> The most glaring to me is that it really isn't useful at all for
> >>>>>> debugging unparse. The data dumps only include then main
> >> outputstream,
> >>>>>> so determine things like suspensions and buffered output is
> >> impossible.
> >>>>>>
> >>>>>> Another issue is the infoset output. When outputting the infoset,
> the
> >>>>>> debugger currently just walks the entire thing and converts it to
> XML
> >>>>>> and displays the XML. For large infosets, this is excess and can
> make
> >>>>> it
> >>>>>> impossible to use, even with some configurations the limit how much
> >> of
> >>>>>> that infoset is actually printed to the screen. Also things like
> >> large
> >>>>>> hex binary blobs create excessive and unusable output.
> >>>>>>
> >>>>>> Another thing I feel is missing is a schema view. Right now it's
> very
> >>>>>> difficult to know where in the schema Daffodil actually is.
> >>>>>>
> >>>>>> I think these issues just need some thought improvement. One could
> >>>>>> imagine a better way to stringify our unparse buffers for debug. One
> >>>>>> could image a way to receive infoset state changes so the debugger
> >> can
> >>>>>> track things like backtracks and remove infosets. One could image a
> >> way
> >>>>>> display the schema
> >>>>>>
> >>>>>> We just need a better way to stringify the current state of the
> >> unparse
> >>>>>> data including buffers, and we need a way to for the debugger to
> >>>>> receive
> >>>>>> state change information about infoset so it can update displays
> >> rather
> >>>>>> than just constantly printing the entire infoset.
> >>>>>>
> >>>>>> However, I think another other big issue is just usability in
> >> general.
> >>>>> I
> >>>>>> think the CLI usage is reasonable, but it's not always user
> friendly,
> >>>>>> and is difficult to view multiple things at the same time. I think
> >>>>>> because of this very few people even use this tool. So this this
> like
> >>>>>> perhaps something worth focus.
> >>>>>>
> >>>>>> My first thought to improving this usability issue would be to
> >>>>> implement
> >>>>>> the Debug Adapter Protocol (DAP)
> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >>>>>> which many IDE's implement. With this implemented, Daffodil could be
> >>>>>> plugged in to any IDE that supports it and essentially get debugging
> >>>>> for
> >>>>>> free, without the need to worry about the GUI elements.
> >>>>>>
> >>>>>> I do have concerns that this just wouldn't have enough functionality
> >>>>>> that we'd really need. For example, DAP really only has ability show
> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
> to
> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
> >>>>>> console output, so we could potentially make it so the console
> output
> >>>>> is
> >>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
> >>>>>> IDE's could support this kindof console output. Does anyone have
> >>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
> >>>>> are
> >>>>>> available?
> >>>>>>
> >>>>>> I also looked into TUI libraries with the idea that we could just
> >>>>> extend
> >>>>>> our current debugger user interface to be a bit friendlier.
> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
> >> those
> >>>>>> that do exist don't have Apache friendly licenses. We also want to
> be
> >>>>>> careful about increase dependencies just for a debugger than many
> >>>>> people
> >>>>>> might not use, so large graphics libraries are probably out of the
> >>>>> question.
> >>>>>>
> >>>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>>>> essentially JSON message based) but could be targeted to the things
> >>>>> that
> >>>>>> a DFDL schema debugger would really need. An added benefit with some
> >>>>>> sort of protocol is the debugger interface can be uncoupled from
> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >>>>>> language/GUI framework and just have it communicate the protocol
> over
> >>>>>> some form of IPC. Another benefit is that any future backends could
> >>>>>> implement this protocol and so a single debugger could hook into
> >>>>>> different backends without much issue. Unfortunately, defining such
> a
> >>>>>> protocol might be a large task, but we do have our existing debug
> >>>>>> infrastructure and things like DAP to guide its development/design.
> >>>>>>
> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> >> we
> >>>>>> really just need the few improvements mentioned to the existing
> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
> >> different
> >>>>>> approach needed to debugging schemas?
> >>>>>>
> >>>>>
> >>>>>
> >>
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Reply via email to