Re: The future of the daffodil DFDL schema debugger?

Adam Rosien Thu, 22 Apr 2021 12:22:23 -0700

Sorry for the threading madness, my default of markdown quoting doesn't
interact well with mailing lists like this...


On Thu, Apr 22, 2021 at 11:14 AM Adam Rosien <[email protected]> wrote:

>
>
> On Thu, Apr 22, 2021 at 7:03 AM Steve Lawrence <[email protected]>
> wrote:
>
>> Some thoughts related to showing the infoset as if it were a variable as
>> this is prototyped
>>
>> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
>> be huge, and most of the time a user only cares about the most recent
>> infoset item. So someway to follow and show just the most recent part of
>> the infoset is important. The current Daffodil debugger as an
>> "infosetLines" setting so that it only shows the most recent X number of
>> lines, which is most all a user cares about when stepping through a parse.
>>
>
> DAP Variables, if nested, can be lazily loaded with children offsets, etc.
>
> > If the number of named or indexed children is large, the numbers should
> be returned via the optional ‘namedVariables’ and ‘indexedVariables’
> attributes.
>
>  -
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
>
> Or as the current behavior, only a window into the infoset could be
> reported.
>
>
>>
>> 2) Infoset items are added and removed very frequently during a parse.
>> Currently, when the Daffodil debugger shows the infoset it just converts
>> the entire thing to XML and displays that. This doesn't work at all for
>> large infosets since this can take a long time. I was hoping this issue
>> would get resolved with this new debugging infrastructure. When the
>> infoset is modified, we ideally want a way to specify via DAP that parts
>> of the variable hierarchy were added/removed rather than having to send
>> the entire infoset during every variable update.
>>
>
> As I understand it, DAP only requests the current state when the debugger
> is stopped (due to a breakpoint, stepping, etc.):
>
> > Whenever the program stops (on program entry, because a breakpoint was
> hit, an exception occurred, or the user requested execution to be paused),
> the debug adapter sends a stopped event with the appropriate reason and
> thread id.
> >
> > Upon receipt, the development tool first requests the threads (see
> below) and then the stacktrace (a list of stack frames) for the thread
> mentioned in the stopped event. If the user then drills into the stack
> frame, the development tool first requests the scopes for a stack frame,
> and then the variables for a scope. If a variable is itself structured, the
> development tool requests its properties through additional variables
> requests.
>
>   - https://microsoft.github.io/debug-adapter-protocol/overview
> ("Stopping and accessing debuggee state")
>
> Large data like infosets could be lazily transferred, or some window into
> it sent.
>
>
>>
>> 3) I can imagine a feature where a user would want to select an infoset
>> item and jump to the associated schema element, or query information
>> about that infoset item (e.g.. what bit position did it start at, what
>> was the length). We don't have this right now, but would be really nice
>> to have. This suggests that we need metadata associated with each of the
>> variables. Does DAP have a concept of that and do IDE's have a way to
>> show it?
>>
>
> From what I can tell, DAP doesn't cover any view-related interaction with
> the debugger state. You can perform actions like "setVariable" if a user
> wants to override a reported value (not sure we'd want this, but just
> pointing it out), but there isn't a "jump to this resource at this
> location" view command defined within DAP.
>
> However, the VS Code extensions I previously mentioned *do* implement
> similar functionality to "jump to this resource at this location". I
> believe VS Code will react to debugger events, for example, when a
> breakpoint is reached and the debuggee provides the current stacktrace, if
> the user selects a particular stack frame, that frame has a reference to
> the associated "source", which the UI can display. In the case of
> stacktrace-as-schema-processing, each frame would correspond to the
> location of the schema element, and the UI could focus on that location.
>
>
>>
>> On 4/21/21 7:52 PM, Adam Rosien wrote:
>> > I've been reading up on DAP and wanted to share...
>> >
>> >> There are many areas though that are unique to Daffodil that have no
>> > representation in the spec.  These things (like InputStream, Infoset,
>> PoU,
>> > different variable types, backtracking, etc) will need an extension to
>> > DAP.  This really boils down to defining these things to fit under the
>> DAP
>> > BaseProtocol and enabling handling of those objects on both the front
>> and
>> > back ends.
>> >
>> > To me, much of the current state exposed by the (Daffodil) Debugger
>> > translates directly to a DAP Variable[1]. DAP Variables can be
>> > nested/hierarchical, so they could (potentially) model larger data like
>> the
>> > infoset. I can imagine shoving all the current state into Variables as a
>> > proof-of-concept.
>> >
>> > It also seems like the processing stack maintained by the Daffodil
>> PState,
>> > where each item references the relevant schema element, could translate
>> to
>> > the DAP StackFrame type [2]. That is, the path from the schema root to
>> the
>> > currently processing schema element becomes the "call stack".
>> (Apologies if
>> > I don't have all the Daffodil terms lined up correctly.)
>> >
>> > For displaying the input data and processing progress, I looked at a few
>> > existing VS Code extensions that provided non-builtin views, some of
>> which
>> > interact with their DAP debugger code [3] [4] [5] [6].
>> >
>> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
>> > reference, wraps Microsoft's java-debug implementation of DAP. I was
>> > curious about the set of request/response and event types. Additionally,
>> > the Typescript API to VS Code offers custom DAP requests and responses,
>> but
>> > I couldn't find the equivalent notion in the java-debug project.
>> >
>> > .. Adam
>> >
>> > [1]
>> >
>> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
>> > [2]
>> >
>> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
>> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
>> > non-debugger custom UI)
>> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
>> view)
>> > [5]
>> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
>> > (debugger + memory view,
>> >
>> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
>> > )
>> > [6]
>> >
>> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
>> > (extension for hexdumps that could be controlled by other extensions)
>> > [7] https://github.com/scalacenter/scala-debug-adapter
>> > [8] https://github.com/microsoft/java-debug
>> >
>> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <[email protected]> wrote:
>> >
>> >>> Going to look deeper into how DAP might fit with Daffodil
>> >>
>> >> Have been looking over DAP and getting a good feeling about it. The
>> >> specification [1] seems general enough that it could be applied to
>> Daffodil
>> >> and cover a swath of common operations (like start, stop, break,
>> continue,
>> >> code locations, variables, etc).
>> >>
>> >> There are many areas though that are unique to Daffodil that have no
>> >> representation in the spec.  These things (like InputStream, Infoset,
>> PoU,
>> >> different variable types, backtracking, etc) will need an extension to
>> >> DAP.  This really boils down to defining these things to fit under the
>> DAP
>> >> BaseProtocol and enabling handling of those objects on both the front
>> and
>> >> back ends.
>> >>
>> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
>> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
>> >> would bring a lot of extra baggage to work around that.  Developing a
>> >> Daffodil specific implementation is no small task, but feasible.
>> There are
>> >> a several existing implementations on the JVM that are close and can be
>> >> looked at for reference.
>> >>
>> >> The backend implementation would look similar to what was described in
>> an
>> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
>> Protocol
>> >> Server to enable the IO between the Daffodil process and the DAP
>> clients.
>> >> This implementation would now be guided by the DAP specification.
>> >>
>> >> With the protocol and backend extended to fit Daffodil that leaves the
>> >> frontend.  In theory an existing IDE plugin should get pretty close to
>> >> being able to perform the common debug operations mentioned above.  To
>> >> support the Daffodil extensions there will need to be handling of the
>> >> extended protocol into whatever views are desired/applicable.
>> >>
>> >>> Also looking into the Java Debug Interface (JDI) for comparison.
>> >>
>> >> JDI appears to be the wrong level of abstraction for what we are
>> talking
>> >> about in debugging Daffodil for schema development.  While DAP does do
>> JVM
>> >> debugging (through a JDI DAP impl) it also generalizes to many other
>> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
>> >>
>> >> Extending the JDI appears to be more complex than dealing with DAP, and
>> >> even though the JDI API is mostly defined with interfaces, there are
>> choke
>> >> points that limit to JVM concepts.  For example jdi.Value has a finite
>> set
>> >> of JVM types that it works with, its not clear where Daffodil types
>> would
>> >> plugin if even possible.
>> >>
>> >> The final note is that unique Daffodil features wouldn’t get to IDE
>> support
>> >> any faster JDI.  In some cases, like VS Code, you would still need an
>> >> extended DAP to support these features.
>> >>
>> >>> and depending on how it shakes out will update the example to show
>> >> integration
>> >>
>> >> It would appear wise to investigate DAP further.  Next step is to
>> refine
>> >> these thoughts with a prototype. I started an implementation in the
>> example
>> >> debugger project [4] to try to run the current example on a _minimal_
>> DAP
>> >> implementation.
>> >>
>> >>
>> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
>> >> [2] https://github.com/Microsoft/java-debug
>> >> [3] https://github.com/scalacenter/scala-debug-adapter
>> >> [4] https://github.com/jw3/example-daffodil-debug
>> >>
>> >>
>> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <[email protected]> wrote:
>> >>
>> >>>> the code is here https://github.com/jw3/example-daffodil-debug
>> >>>
>> >>> There is now a complete console based example for Zio that
>> demonstrates
>> >>> controlling the debug flow while distributing the current state to
>> three
>> >>> "displays".
>> >>> 1. infoset at current step
>> >>> 2. diff of infoset against previous step
>> >>> 3. bit position and value of data.
>> >>>
>> >>> These displays are very rudimentary but demonstrate the ability to
>> >>> asynchronously populate multiple views while synchronously controlling
>> >> the
>> >>> debug loop.
>> >>>
>> >>>> - The new protocol being informed by existing debugger and DAPis key
>> >>>
>> >>> Going to look deeper into how DAP might fit with Daffodil, and
>> depending
>> >>> on how it shakes out will update the example to show integration.
>> >>>
>> >>> Some interesting links to start with
>> >>> - https://github.com/scalacenter/scala-debug-adapter
>> >>> -
>> >>>
>> >>
>> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
>> >>> - https://github.com/microsoft/java-debug
>> >>>
>> >>> Also looking into the Java Debug Interface (JDI) for comparison.
>> >>>
>> >>>
>> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <[email protected]> wrote:
>> >>>
>> >>>> Revisiting this post after doing some debugger related work and
>> thinking
>> >>>> about debug protocol/adapters to connect external tooling to the
>> debug
>> >>>> process.
>> >>>>
>> >>>> This comment is good
>> >>>>
>> >>>>> This allo makes me wonder if an approach worth taking for the future
>> >> of
>> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>> >>>> Protocol". I imagine it would be loosely based on DAP (which is
>> >>>> essentially JSON message based) but could be targeted to the things
>> >> that a
>> >>>> DFDL schema debugger would really need. An added benefit with some
>> >> sort of
>> >>>> protocol is the debugger interface can be uncoupled from Daffodil
>> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
>> >>>> framework and just have it communicate the protocol over some form of
>> >>>> IPC. Another benefit is that any future backends could implement this
>> >>>> protocol and so a single debugger could hook into different backends
>> >>>> without much issue. Unfortunately, defining such a protocol might be
>> a
>> >>>> large task, but we do have our existing debug infrastructure and
>> things
>> >>>> like DAP to guide its development/design.
>> >>>>
>> >>>> Some thoughts on this
>> >>>> - Defining the protocol will be a large task, but a minimal version
>> >>>> should get up and round tripping quickly with a minimal subset of the
>> >>>> protocol.
>> >>>> - The new protocol being informed by existing debugger and DAPis key
>> >>>> - Uncoupling from Daffodil is key
>> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as
>> not to
>> >>>> constrain Daffodil debugging capability
>> >>>> - We dont need to tie the protocol or adapters to a single framework,
>> >>>> implementations of the IO layer should be simple enough to support
>> >> multiple
>> >>>> things (eg Akka, Zio, "basic" ...)
>> >>>> - The current debugger lives in runtime1, but can we make an abstract
>> >> API
>> >>>> that any runtime would implement?
>> >>>>
>> >>>> Maybe a solution is structured like this
>> >>>> - daffodil-debug-api:
>> >>>>   - protocol model
>> >>>>   - interfaces: debugger / IO adapter / etc
>> >>>>   - lives in daffodil repo (new subproject?)
>> >>>> - daffodil-debug-io-NAME
>> >>>>   - provides implementation of a specific IO adapter
>> >>>>   - multiple projects possible (daffodil-debugger-akka,
>> >>>> daffodil-debugger-zio, etc)
>> >>>>   - supported ones live in their own subprojects, but other can be
>> >>>> plugged in from external sources
>> >>>>   - ability to support multiple implementations reduces risk of
>> lock-in
>> >>>> - debugger applications
>> >>>>   - maintained in external repositories
>> >>>>   - depending on the IO implementation these could execute be in
>> >> separate
>> >>>> process or on separate machine
>> >>>>   - like Steve said, could be any language / framework
>> >>>>
>> >>>> Three types of reference implementations / sample applications could
>> >> also
>> >>>> guide the development of the API
>> >>>>   1. a replacement for the existing TUI debugger, expected to end up
>> >> with
>> >>>> at minimum the same functionality as the current one.
>> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>> >>>>   3. an IDE integration
>> >>>>
>> >>>> Thoughts?
>> >>>>
>> >>>> Also I'm working on some reference implementations of these concepts
>> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
>> >> code
>> >>>> is here https://github.com/jw3/example-daffodil-debug
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <[email protected]>
>> >>>> wrote:
>> >>>>
>> >>>>> Yep, something like that seems very reasonable for dealing with
>> large
>> >>>>> infosets. But it still feels like we still run into usability
>> issues.
>> >>>>> For example, what if a user wants to see more? We need some
>> >>>>> configuration options to increase what we've ellided. It's not big,
>> but
>> >>>>> every new thing that needs configuration adds complexity and
>> decreases
>> >>>>> usability.
>> >>>>>
>> >>>>> And I think the only reason we are trying to spend effort elliding
>> >>>>> things is because we're limited to this gdb-like interface where you
>> >> can
>> >>>>> only print out a little information at a time.
>> >>>>>
>> >>>>> I think what would really is to dump this gdb interface and instead
>> use
>> >>>>> multiple windows/views. As a really close example to what I
>> imagine, I
>> >>>>> recently came across this hex editor:
>> >>>>>
>> >>>>> https://www.synalysis.net/
>> >>>>>
>> >>>>> The screenshots are a bit small so it's not super clear, but this
>> tool
>> >>>>> has one view for the data in hex, and one view for a tree of parsed
>> >>>>> results (which is very similar to our infoset). The "infoset" view
>> has
>> >>>>> information like offset/length/value, and can be related back to the
>> >>>>> data view to find the actual bits.
>> >>>>>
>> >>>>> I imagine the "next generation daffodil debugger" to look much like
>> >>>>> this. As data is parsed, the infoset view fills up. This view could
>> act
>> >>>>> like a standard GUI tree so you could collapse sections or scroll
>> >> around
>> >>>>> to show just the parts you care about, and have search capabilities
>> to
>> >>>>> quickly jump around. The advantage here is you no longer really need
>> >>>>> automated eliding or heuristics for what the user *might* care
>> about.
>> >>>>> You just show the whole thing and let user scroll around. As
>> daffodil
>> >>>>> parses and backtracks, this tree grows or shrinks.
>> >>>>>
>> >>>>> I also imagine you could have a cursor moving around the hex view,
>> so
>> >> as
>> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
>> >>>>> integers), one could update this data view to show what daffodil is
>> >>>>> doing and where it is.
>> >>>>>
>> >>>>> I also image there could be other views as well. For example, a
>> schema
>> >>>>> view to show where in the schema daffodil is, and to add/remove
>> >>>>> breakpoints. And an information view for things like variables,
>> >> in-scope
>> >>>>> delimiters, PoU's, etc.
>> >>>>>
>> >>>>> The only reason I mention a debug protcol is that would allow this
>> GUI
>> >>>>> to be more easily written in something other that Java/Scala to take
>> >>>>> advantage of other GUI toolkits. It's been a long while since I've
>> done
>> >>>>> anything with Java guis, but they seems pretty poor that last I
>> looked
>> >>>>> at them. Would even allow for a TUI, which Java has little/no
>> support
>> >>>>> for. Also enables things like remote deubgging if an socket IPC was
>> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
>> what
>> >>>>> would be ideal, and it can always be pared back.
>> >>>>>
>> >>>>>
>> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>> >>>>>> I don't think of it as a daffodil debug protocol, but just a
>> >>>>> separation of concerns between display of information and the
>> >> behaviors of
>> >>>>> parse/unparse that need to be points where users can pause, and data
>> >>>>> structures available to display.
>> >>>>>>
>> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
>> >>>>> clumsy, too big, etc.  The infoset is available in the processor
>> >> state, and
>> >>>>> one can examine the current node, enclosing node, prior sibling(s),
>> >>>>> following sibling(s), etc. One can elide contents that are too big
>> for
>> >>>>> hexBinary, etc.
>> >>>>>>
>> >>>>>> I think this problem, how to display the infoset with sensible
>> limits
>> >>>>> on sizing, is fairly easy to come up with some design for, that
>> will at
>> >>>>> least be (1) always fairly small (2) much more useful in more
>> cases. It
>> >>>>> won't be perfect but can be much better than what we do now.
>> >>>>>>
>> >>>>>> One sensible display "mode" should be that displaying the context
>> >>>>> surrounding the current element (when parsing or unparsing)
>> displays at
>> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
>> >> characters
>> >>>>> (settable within reason ?)
>> >>>>>>
>> >>>>>> Sibling and enclosing nodes would be displayed eliding their
>> contents
>> >>>>> to at most 1 line.
>> >>>>>>
>> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines
>> total:
>> >>>>>>
>> >>>>>> ...
>> >>>>>> <enclosingParent1>
>> >>>>>>    ...
>> >>>>>>    <priorSibling2>89ab782 ...</...>
>> >>>>>>    <priorSibling1>some text is here and some more text</...>
>> >>>>>>    <currentNode>value might be some big thing which needs to be
>> >> elided
>> >>>>> ...</...>
>> >>>>>>    <followingSibling1> ... </...>
>> >>>>>>    ???
>> >>>>>> </enclosingParent1>
>> >>>>>> ???
>> >>>>>>
>> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
>> >>>>>>
>> >>>>>> The ... on a line alone or where element content would appear
>> >>>>> generally means 1 or more other siblings. The way the display above
>> >> starts
>> >>>>> with ... means that this is a relative inner nest, not starting from
>> >> the
>> >>>>> absolute root.
>> >>>>>>
>> >>>>>> The ... within simple content means that content is elided to fit
>> on
>> >>>>> one line. Always follows some text characters to differentiate from
>> the
>> >>>>> child-element context.
>> >>>>>>
>> >>>>>> The ??? means zero or more other siblings.
>> >>>>>>
>> >>>>>> I used bold italic above to point out that the current node would
>> be
>> >>>>> highlighted somehow. Probably a way to do this that doesn't require
>> >> display
>> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
>> >>>>>>
>> >>>>>>>>> <currentNode>value .... </...>
>> >>>>>>
>> >>>>>> might be better, particularly for a trace output being dumped to a
>> >>>>> text file.
>> >>>>>>
>> >>>>>> I made the above example an unparser kind of example by showing a
>> >>>>> following sibling that exists that is after the current node.
>> >>>>>>
>> >>>>>> I think the key concept is that any sibling node is displayed in a
>> >> way
>> >>>>> that fits on one line.
>> >>>>>> E.g., even if the element name was really long, I'd suggest:
>> >>>>>>
>> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>> >>>>>>
>> >>>>>> Where the element name itself gets elided because it is too long.
>> >>>>>>
>> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
>> >> but
>> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
>> >> could be
>> >>>>> done as well:
>> >>>>>>
>> >>>>>> enclosingParent1 = {
>> >>>>>>    ...
>> >>>>>>    priorSibling2 = "89ab782..."
>> >>>>>>    priorSibling1 = "some text is here and some more text"
>> >>>>>>    currentNode = "value might be some big thing which needs to be
>> >>>>> elided ..."
>> >>>>>>    followingSibling1 = { ... }
>> >>>>>>    ???
>> >>>>>> }
>> >>>>>>
>> >>>>>> That's enough for 1 email thread on this debug topic.
>> >>>>>>
>> >>>>>>
>> >>>>>> ________________________________
>> >>>>>> From: Steve Lawrence <[email protected]>
>> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
>> >>>>>> To: [email protected] <[email protected]>
>> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
>> >>>>>>
>> >>>>>>
>> >>>>>> Now that we're in a new year, I'd like to start a discussion about
>> >> the
>> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
>> more
>> >>>>>> useful.
>> >>>>>>
>> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
>> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
>> >>>>> provide
>> >>>>>> enough extra information during a parse/unparse so that a schema
>> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
>> >>>>>> easier for users (rather than developers) to determine why a schema
>> >>>>>> isn't giving the expect parse/unparse result (either because of bad
>> >>>>> data
>> >>>>>> or a faulty schema.
>> >>>>>>
>> >>>>>> The current state of the debugger is enabled by providing the
>> --debug
>> >>>>> or
>> >>>>>> --trace flags in the CLI. More information about that here:
>> >>>>>>
>> >>>>>> https://daffodil.apache.org/debugger/
>> >>>>>>
>> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
>> >>>>> thins
>> >>>>>> like breakpoints, steps, displaying the current infoset, display a
>> >> dump
>> >>>>>> of the data, etc.
>> >>>>>>
>> >>>>>> Although I find this tool pretty useful, it definitely has some
>> >> glaring
>> >>>>>> issues.
>> >>>>>>
>> >>>>>> The most glaring to me is that it really isn't useful at all for
>> >>>>>> debugging unparse. The data dumps only include then main
>> >> outputstream,
>> >>>>>> so determine things like suspensions and buffered output is
>> >> impossible.
>> >>>>>>
>> >>>>>> Another issue is the infoset output. When outputting the infoset,
>> the
>> >>>>>> debugger currently just walks the entire thing and converts it to
>> XML
>> >>>>>> and displays the XML. For large infosets, this is excess and can
>> make
>> >>>>> it
>> >>>>>> impossible to use, even with some configurations the limit how much
>> >> of
>> >>>>>> that infoset is actually printed to the screen. Also things like
>> >> large
>> >>>>>> hex binary blobs create excessive and unusable output.
>> >>>>>>
>> >>>>>> Another thing I feel is missing is a schema view. Right now it's
>> very
>> >>>>>> difficult to know where in the schema Daffodil actually is.
>> >>>>>>
>> >>>>>> I think these issues just need some thought improvement. One could
>> >>>>>> imagine a better way to stringify our unparse buffers for debug.
>> One
>> >>>>>> could image a way to receive infoset state changes so the debugger
>> >> can
>> >>>>>> track things like backtracks and remove infosets. One could image a
>> >> way
>> >>>>>> display the schema
>> >>>>>>
>> >>>>>> We just need a better way to stringify the current state of the
>> >> unparse
>> >>>>>> data including buffers, and we need a way to for the debugger to
>> >>>>> receive
>> >>>>>> state change information about infoset so it can update displays
>> >> rather
>> >>>>>> than just constantly printing the entire infoset.
>> >>>>>>
>> >>>>>> However, I think another other big issue is just usability in
>> >> general.
>> >>>>> I
>> >>>>>> think the CLI usage is reasonable, but it's not always user
>> friendly,
>> >>>>>> and is difficult to view multiple things at the same time. I think
>> >>>>>> because of this very few people even use this tool. So this this
>> like
>> >>>>>> perhaps something worth focus.
>> >>>>>>
>> >>>>>> My first thought to improving this usability issue would be to
>> >>>>> implement
>> >>>>>> the Debug Adapter Protocol (DAP)
>> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for
>> Daffodil,
>> >>>>>> which many IDE's implement. With this implemented, Daffodil could
>> be
>> >>>>>> plugged in to any IDE that supports it and essentially get
>> debugging
>> >>>>> for
>> >>>>>> free, without the need to worry about the GUI elements.
>> >>>>>>
>> >>>>>> I do have concerns that this just wouldn't have enough
>> functionality
>> >>>>>> that we'd really need. For example, DAP really only has ability
>> show
>> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
>> to
>> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
>> >>>>>> console output, so we could potentially make it so the console
>> output
>> >>>>> is
>> >>>>>> a live view of infoset/data. But I'm not even sure most DAP
>> friendly
>> >>>>>> IDE's could support this kindof console output. Does anyone have
>> >>>>>> familiarity with DAP IDE's or and what kinds of console
>> capabilities
>> >>>>> are
>> >>>>>> available?
>> >>>>>>
>> >>>>>> I also looked into TUI libraries with the idea that we could just
>> >>>>> extend
>> >>>>>> our current debugger user interface to be a bit friendlier.
>> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
>> >> those
>> >>>>>> that do exist don't have Apache friendly licenses. We also want to
>> be
>> >>>>>> careful about increase dependencies just for a debugger than many
>> >>>>> people
>> >>>>>> might not use, so large graphics libraries are probably out of the
>> >>>>> question.
>> >>>>>>
>> >>>>>> This allo makes me wonder if an approach worth taking for the
>> future
>> >> of
>> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
>> >>>>>> essentially JSON message based) but could be targeted to the things
>> >>>>> that
>> >>>>>> a DFDL schema debugger would really need. An added benefit with
>> some
>> >>>>>> sort of protocol is the debugger interface can be uncoupled from
>> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
>> >>>>>> language/GUI framework and just have it communicate the protocol
>> over
>> >>>>>> some form of IPC. Another benefit is that any future backends could
>> >>>>>> implement this protocol and so a single debugger could hook into
>> >>>>>> different backends without much issue. Unfortunately, defining
>> such a
>> >>>>>> protocol might be a large task, but we do have our existing debug
>> >>>>>> infrastructure and things like DAP to guide its development/design.
>> >>>>>>
>> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it?
>> Perhaps
>> >> we
>> >>>>>> really just need the few improvements mentioned to the existing
>> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
>> >> different
>> >>>>>> approach needed to debugging schemas?
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>
>> >
>>
>>

Re: The future of the daffodil DFDL schema debugger?

Reply via email to