Re: The future of the daffodil DFDL schema debugger?

John Wass Fri, 08 Jan 2021 13:45:27 -0800

What other features could find a nice home in an IDE integration?  Having
single convenient entrypoint (the IDE) for such things would be nice, imo.


Things like...

- Rich set of actions for TDML
  - Run a single test from a TDML file
  - Debug/Run TDML
- Run/Debug a data file with a schema from the project
  - ie Right click on a JPG and have context menu for Run with Daffodil ->
pick from list of dfdl.xsd
...



On Fri, Jan 8, 2021 at 2:47 PM Beckerle, Mike <[email protected]>
wrote:

> Use cases or quasi-requirements. This is my summary so far.
>
> 1) capture a human-readable trace of parse/unparse information to a single
> text file (might be same as 2 if machine-readable is sufficiently human
> readable)
>
> 2) capture a machine-readable trace of parse/unparse information to a
> single text file (might be same as 1 if human readable form is also machine
> readable)
>
> 3) interactive debug from a command line - each display of information is
> requested by a specific command (1 and 2 above might be using this with a
> specific canned set of commands auto-issued to display various information,
> and capturing all to an output stream)
>
> 4) interactive debug with multi-panel display where displays are
> updated/animated automatically as debug context changes. (This is intended
> to mean more than just opening all the schema files in different editor
> windows - more than just gdb-style debug under Emacs.)
>
> 5) interactive debug time-machine - ability to backup to prior
> parser/unparser states, move forward again, or just backup and re-check
> something, but then jump forward to proceed from where one left off.
>
> 6) Non Use Case: IDE for DFDL with rich semantic model (akin to the DSOM
> object model) of the schema.
> This is here just to point out that it's really out of scope. There are
> many questions about the schema (e.g., "can I add this property to this
> element?") that are not? required for the debugger. A full and powerful IDE
> is great, but that's really entirely different than our goals for debugging
> that we're trying to discuss here.
>
>
> ________________________________
> From: Sloane, Brandon <[email protected]>
> Sent: Thursday, January 7, 2021 1:25 PM
> To: [email protected] <[email protected]>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> We could also create a new flag for --trace that would format the trace
> output in a more machine readable manner. This should let us accomplish
> Larry's goals, and most of mine, with relativly little effort within
> Daffodil (but still all the effort on the GUI side), and would allow for
> off-site analysis in cases where it is not practical to attach a debugger
> while Daffodil is running.
> ________________________________
> From: Sloane, Brandon <[email protected]>
> Sent: Thursday, January 7, 2021 1:21 PM
> To: [email protected] <[email protected]>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> I've been thinking about a tool along similar lines (although more
> integrated with Daffodil than post-processing the trace output).
>
> One thing to keep in mind is that, although the trace output is presented
> as a linear log (since we do not have much choice), the actual process is
> more of a tree, due to backtracking.
>
> Ideally, we would have a multi-pane window showing:
>
>
>   *   The hex/binary data
>   *   The infoset
>   *   A time-axis parse tree; with a "major" node at every point of
> uncertainty and parse error, and "minor" nodes at every parse step
>   *   A view of the DFDL schema
>   *   An interactive terminal debugger (e.g. what we currently have)
>   *   Breakpoints/variables/delimeter-stack/etc
>
> Within these panes, you ough to be able to select a given region/element,
> and highlight all the corresponding elements in the other panes.
>
> I think that exporting the nessasary information from Daffodil to
> implement all of this would be relativly straightforward. The only
> potentially problametic parts I see are:
>
>   *   The interactive debugger would require some form of time-travel to
> implement (I think most of the work for this is done to support backracking)
>   *   The memory requirements when used on large infosets
>
> ________________________________
> From: Larry Barber <[email protected]>
> Sent: Thursday, January 7, 2021 1:08 PM
> To: [email protected] <[email protected]>
> Subject: RE: The future of the daffodil DFDL schema debugger?
>
> When I was doing strange and unusual things with DFDL and generating a lot
> of errors, I envisioned how helpful it would be to have a tool that would
> post-process the --trace output and use it to display a dual pane window
> (like the editor referenced below) with the schema on one side and hex
> version on the other, with a slider that would allow be to flow through the
> parsing action and see pointers as to where the parser was in both the
> schema and input files. In other words just convert the information from
> the -trace into a more useful graphical display.
> Perhaps breakpoint like markers could be added to both files to quickly
> scan through and display what sections of the schema read which locations
> in the file, or vice versa.
>
> -----Original Message-----
> From: Steve Lawrence [mailto:[email protected]]
> Sent: Wednesday, January 6, 2021 1:42 PM
> To: [email protected]
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Yep, something like that seems very reasonable for dealing with large
> infosets. But it still feels like we still run into usability issues.
> For example, what if a user wants to see more? We need some configuration
> options to increase what we've ellided. It's not big, but every new thing
> that needs configuration adds complexity and decreases usability.
>
> And I think the only reason we are trying to spend effort elliding things
> is because we're limited to this gdb-like interface where you can only
> print out a little information at a time.
>
> I think what would really is to dump this gdb interface and instead use
> multiple windows/views. As a really close example to what I imagine, I
> recently came across this hex editor:
>
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0
>
> The screenshots are a bit small so it's not super clear, but this tool has
> one view for the data in hex, and one view for a tree of parsed results
> (which is very similar to our infoset). The "infoset" view has information
> like offset/length/value, and can be related back to the data view to find
> the actual bits.
>
> I imagine the "next generation daffodil debugger" to look much like this.
> As data is parsed, the infoset view fills up. This view could act like a
> standard GUI tree so you could collapse sections or scroll around to show
> just the parts you care about, and have search capabilities to quickly jump
> around. The advantage here is you no longer really need automated eliding
> or heuristics for what the user *might* care about.
> You just show the whole thing and let user scroll around. As daffodil
> parses and backtracks, this tree grows or shrinks.
>
> I also imagine you could have a cursor moving around the hex view, so as
> daffodil moves around (e.g. scanning for delimiters, extracting integers),
> one could update this data view to show what daffodil is doing and where it
> is.
>
> I also image there could be other views as well. For example, a schema
> view to show where in the schema daffodil is, and to add/remove
> breakpoints. And an information view for things like variables, in-scope
> delimiters, PoU's, etc.
>
> The only reason I mention a debug protcol is that would allow this GUI to
> be more easily written in something other that Java/Scala to take advantage
> of other GUI toolkits. It's been a long while since I've done anything with
> Java guis, but they seems pretty poor that last I looked at them. Would
> even allow for a TUI, which Java has little/no support for. Also enables
> things like remote deubgging if an socket IPC was used. Though I'm not sure
> all of that is necessary. Just thinking what would be ideal, and it can
> always be pared back.
>
>
> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> > I don't think of it as a daffodil debug protocol, but just a separation
> of concerns between display of information and the behaviors of
> parse/unparse that need to be points where users can pause, and data
> structures available to display.
> >
> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> clumsy, too big, etc.  The infoset is available in the processor state, and
> one can examine the current node, enclosing node, prior sibling(s),
> following sibling(s), etc. One can elide contents that are too big for
> hexBinary, etc.
> >
> > I think this problem, how to display the infoset with sensible limits on
> sizing, is fairly easy to come up with some design for, that will at least
> be (1) always fairly small (2) much more useful in more cases. It won't be
> perfect but can be much better than what we do now.
> >
> > One sensible display "mode" should be that displaying the context
> > surrounding the current element (when parsing or unparsing) displays
> > at most N-lines. (N/2 before, N/2 after) with a maximum length of L
> > characters (settable within reason ?)
> >
> > Sibling and enclosing nodes would be displayed eliding their contents to
> at most 1 line.
> >
> > Here's an example of what I mean. Displaying up to M=10 lines total:
> >
> > ...
> > <enclosingParent1>
> >    ...
> >    <priorSibling2>89ab782 ...</...>
> >    <priorSibling1>some text is here and some more text</...>
> >    <currentNode>value might be some big thing which needs to be elided
> ...</...>
> >    <followingSibling1> ... </...>
> >    ???
> > </enclosingParent1>
> > ???
> >
> > The </...> is just an idea to reduce XML matching end-tag clutter.
> >
> > The ... on a line alone or where element content would appear generally
> means 1 or more other siblings. The way the display above starts with ...
> means that this is a relative inner nest, not starting from the absolute
> root.
> >
> > The ... within simple content means that content is elided to fit on one
> line. Always follows some text characters to differentiate from the
> child-element context.
> >
> > The ??? means zero or more other siblings.
> >
> > I used bold italic above to point out that the current node would be
> highlighted somehow. Probably a way to do this that doesn't require display
> modes would be useful. E.g., a text marker like ">>>" as in:
> >
> >>>> <currentNode>value .... </...>
> >
> > might be better, particularly for a trace output being dumped to a text
> file.
> >
> > I made the above example an unparser kind of example by showing a
> following sibling that exists that is after the current node.
> >
> > I think the key concept is that any sibling node is displayed in a way
> that fits on one line.
> > E.g., even if the element name was really long, I'd suggest:
> >
> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >
> > Where the element name itself gets elided because it is too long.
> >
> > A thought. Note that the above presentation is shown as quasi-XML, but
> there's nothing XML-specific about it. A JSON-friendly equivalent could be
> done as well:
> >
> > enclosingParent1 = {
> >    ...
> >    priorSibling2 = "89ab782..."
> >    priorSibling1 = "some text is here and some more text"
> >    currentNode = "value might be some big thing which needs to be elided
> ..."
> >    followingSibling1 = { ... }
> >    ???
> > }
> >
> > That's enough for 1 email thread on this debug topic.
> >
> >
> > ________________________________
> > From: Steve Lawrence <[email protected]>
> > Sent: Tuesday, January 5, 2021 2:26 PM
> > To: [email protected] <[email protected]>
> > Subject: The future of the daffodil DFDL schema debugger?
> >
> >
> > Now that we're in a new year, I'd like to start a discussion about the
> > Daffodil DFDL Schema debugger and how it might be improved to be more
> > useful.
> >
> > Note that this is not the capabilities to debug Daffodil itself in
> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> > provide enough extra information during a parse/unparse so that a
> > schema developer can get an idea of what Daffodil is doing. This makes
> > it easier for users (rather than developers) to determine why a schema
> > isn't giving the expect parse/unparse result (either because of bad
> > data or a faulty schema.
> >
> > The current state of the debugger is enabled by providing the --debug
> > or --trace flags in the CLI. More information about that here:
> >
> > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> > fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> > gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> > 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> > data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> > =0
> >
> > This enables a TUI and commands somewhat similar to GDB, providing
> > thins like breakpoints, steps, displaying the current infoset, display
> > a dump of the data, etc.
> >
> > Although I find this tool pretty useful, it definitely has some
> > glaring issues.
> >
> > The most glaring to me is that it really isn't useful at all for
> > debugging unparse. The data dumps only include then main outputstream,
> > so determine things like suspensions and buffered output is impossible.
> >
> > Another issue is the infoset output. When outputting the infoset, the
> > debugger currently just walks the entire thing and converts it to XML
> > and displays the XML. For large infosets, this is excess and can make
> > it impossible to use, even with some configurations the limit how much
> > of that infoset is actually printed to the screen. Also things like
> > large hex binary blobs create excessive and unusable output.
> >
> > Another thing I feel is missing is a schema view. Right now it's very
> > difficult to know where in the schema Daffodil actually is.
> >
> > I think these issues just need some thought improvement. One could
> > imagine a better way to stringify our unparse buffers for debug. One
> > could image a way to receive infoset state changes so the debugger can
> > track things like backtracks and remove infosets. One could image a
> > way display the schema
> >
> > We just need a better way to stringify the current state of the
> > unparse data including buffers, and we need a way to for the debugger
> > to receive state change information about infoset so it can update
> > displays rather than just constantly printing the entire infoset.
> >
> > However, I think another other big issue is just usability in general.
> > I think the CLI usage is reasonable, but it's not always user
> > friendly, and is difficult to view multiple things at the same time. I
> > think because of this very few people even use this tool. So this this
> > like perhaps something worth focus.
> >
> > My first thought to improving this usability issue would be to
> > implement the Debug Adapter Protocol (DAP)
> > (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> > crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> > .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> >
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0)
> for Daffodil, which many IDE's implement. With this implemented, Daffodil
> could be plugged in to any IDE that supports it and essentially get
> debugging for free, without the need to worry about the GUI elements.
> >
> > I do have concerns that this just wouldn't have enough functionality
> > that we'd really need. For example, DAP really only has ability show
> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> > show a live view of the infoset or data. Most DAP IDE's do have a
> > console output, so we could potentially make it so the console output
> > is a live view of infoset/data. But I'm not even sure most DAP
> > friendly IDE's could support this kindof console output. Does anyone
> > have familiarity with DAP IDE's or and what kinds of console
> > capabilities are available?
> >
> > I also looked into TUI libraries with the idea that we could just
> > extend our current debugger user interface to be a bit friendlier.
> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> > those that do exist don't have Apache friendly licenses. We also want
> > to be careful about increase dependencies just for a debugger than
> > many people might not use, so large graphics libraries are probably out
> of the question.
> >
> > This allo makes me wonder if an approach worth taking for the future
> > of Daffodil schema debugging is developing a sort of "Daffodil Debug
> > Protocol". I imagine it would be loosely based on DAP (which is
> > essentially JSON message based) but could be targeted to the things
> > that a DFDL schema debugger would really need. An added benefit with
> > some sort of protocol is the debugger interface can be uncoupled from
> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> > language/GUI framework and just have it communicate the protocol over
> > some form of IPC. Another benefit is that any future backends could
> > implement this protocol and so a single debugger could hook into
> > different backends without much issue. Unfortunately, defining such a
> > protocol might be a large task, but we do have our existing debug
> > infrastructure and things like DAP to guide its development/design.
> >
> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> > we really just need the few improvements mentioned to the existing
> > debugger. Is that enough to make it usable? Or is an entirely
> > different approach needed to debugging schemas?
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Reply via email to