RE: The future of the daffodil DFDL schema debugger?

Larry Barber Thu, 07 Jan 2021 10:08:53 -0800

When I was doing strange and unusual things with DFDL and generating a lot of 
errors, I envisioned how helpful it would be to have a tool that would 
post-process the --trace output and use it to display a dual pane window (like 
the editor referenced below) with the schema on one side and hex version on the 
other, with a slider that would allow be to flow through the parsing action and 
see pointers as to where the parser was in both the schema and input files. In 
other words just convert the information from the -trace into a more useful 
graphical display.
Perhaps breakpoint like markers could be added to both files to quickly scan 
through and display what sections of the schema read which locations in the 
file, or vice versa.


-----Original Message-----
From: Steve Lawrence [mailto:[email protected]] 
Sent: Wednesday, January 6, 2021 1:42 PM
To: [email protected]
Subject: Re: The future of the daffodil DFDL schema debugger?

Yep, something like that seems very reasonable for dealing with large infosets. 
But it still feels like we still run into usability issues.
For example, what if a user wants to see more? We need some configuration 
options to increase what we've ellided. It's not big, but every new thing that 
needs configuration adds complexity and decreases usability.

And I think the only reason we are trying to spend effort elliding things is 
because we're limited to this gdb-like interface where you can only print out a 
little information at a time.

I think what would really is to dump this gdb interface and instead use 
multiple windows/views. As a really close example to what I imagine, I recently 
came across this hex editor:

https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0

The screenshots are a bit small so it's not super clear, but this tool has one 
view for the data in hex, and one view for a tree of parsed results (which is 
very similar to our infoset). The "infoset" view has information like 
offset/length/value, and can be related back to the data view to find the 
actual bits.

I imagine the "next generation daffodil debugger" to look much like this. As 
data is parsed, the infoset view fills up. This view could act like a standard 
GUI tree so you could collapse sections or scroll around to show just the parts 
you care about, and have search capabilities to quickly jump around. The 
advantage here is you no longer really need automated eliding or heuristics for 
what the user *might* care about.
You just show the whole thing and let user scroll around. As daffodil parses 
and backtracks, this tree grows or shrinks.

I also imagine you could have a cursor moving around the hex view, so as 
daffodil moves around (e.g. scanning for delimiters, extracting integers), one 
could update this data view to show what daffodil is doing and where it is.

I also image there could be other views as well. For example, a schema view to 
show where in the schema daffodil is, and to add/remove breakpoints. And an 
information view for things like variables, in-scope delimiters, PoU's, etc.

The only reason I mention a debug protcol is that would allow this GUI to be 
more easily written in something other that Java/Scala to take advantage of 
other GUI toolkits. It's been a long while since I've done anything with Java 
guis, but they seems pretty poor that last I looked at them. Would even allow 
for a TUI, which Java has little/no support for. Also enables things like 
remote deubgging if an socket IPC was used. Though I'm not sure all of that is 
necessary. Just thinking what would be ideal, and it can always be pared back.


On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> I don't think of it as a daffodil debug protocol, but just a separation of 
> concerns between display of information and the behaviors of parse/unparse 
> that need to be points where users can pause, and data structures available 
> to display.
> 
> E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, 
> too big, etc.  The infoset is available in the processor state, and one can 
> examine the current node, enclosing node, prior sibling(s), following 
> sibling(s), etc. One can elide contents that are too big for hexBinary, etc.
> 
> I think this problem, how to display the infoset with sensible limits on 
> sizing, is fairly easy to come up with some design for, that will at least be 
> (1) always fairly small (2) much more useful in more cases. It won't be 
> perfect but can be much better than what we do now.
> 
> One sensible display "mode" should be that displaying the context 
> surrounding the current element (when parsing or unparsing) displays 
> at most N-lines. (N/2 before, N/2 after) with a maximum length of L 
> characters (settable within reason ?)
> 
> Sibling and enclosing nodes would be displayed eliding their contents to at 
> most 1 line.
> 
> Here's an example of what I mean. Displaying up to M=10 lines total:
> 
> ...
> <enclosingParent1>
>    ...
>    <priorSibling2>89ab782 ...</...>
>    <priorSibling1>some text is here and some more text</...>
>    <currentNode>value might be some big thing which needs to be elided 
> ...</...>
>    <followingSibling1> ... </...>
>    ???
> </enclosingParent1>
> ???
> 
> The </...> is just an idea to reduce XML matching end-tag clutter.
> 
> The ... on a line alone or where element content would appear generally means 
> 1 or more other siblings. The way the display above starts with ... means 
> that this is a relative inner nest, not starting from the absolute root.
> 
> The ... within simple content means that content is elided to fit on one 
> line. Always follows some text characters to differentiate from the 
> child-element context.
> 
> The ??? means zero or more other siblings.
> 
> I used bold italic above to point out that the current node would be 
> highlighted somehow. Probably a way to do this that doesn't require display 
> modes would be useful. E.g., a text marker like ">>>" as in:
> 
>>>> <currentNode>value .... </...>
> 
> might be better, particularly for a trace output being dumped to a text file.
> 
> I made the above example an unparser kind of example by showing a following 
> sibling that exists that is after the current node.
> 
> I think the key concept is that any sibling node is displayed in a way that 
> fits on one line.
> E.g., even if the element name was really long, I'd suggest:
> 
>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> 
> Where the element name itself gets elided because it is too long.
> 
> A thought. Note that the above presentation is shown as quasi-XML, but 
> there's nothing XML-specific about it. A JSON-friendly equivalent could be 
> done as well:
> 
> enclosingParent1 = {
>    ...
>    priorSibling2 = "89ab782..."
>    priorSibling1 = "some text is here and some more text"
>    currentNode = "value might be some big thing which needs to be elided ..."
>    followingSibling1 = { ... }
>    ???
> }
> 
> That's enough for 1 email thread on this debug topic.
> 
> 
> ________________________________
> From: Steve Lawrence <[email protected]>
> Sent: Tuesday, January 5, 2021 2:26 PM
> To: [email protected] <[email protected]>
> Subject: The future of the daffodil DFDL schema debugger?
> 
> 
> Now that we're in a new year, I'd like to start a discussion about the 
> Daffodil DFDL Schema debugger and how it might be improved to be more 
> useful.
> 
> Note that this is not the capabilities to debug Daffodil itself in 
> something like Eclipse/IntelliJ, but the ability for Daffodil to 
> provide enough extra information during a parse/unparse so that a 
> schema developer can get an idea of what Daffodil is doing. This makes 
> it easier for users (rather than developers) to determine why a schema 
> isn't giving the expect parse/unparse result (either because of bad 
> data or a faulty schema.
> 
> The current state of the debugger is enabled by providing the --debug 
> or --trace flags in the CLI. More information about that here:
> 
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> =0
> 
> This enables a TUI and commands somewhat similar to GDB, providing 
> thins like breakpoints, steps, displaying the current infoset, display 
> a dump of the data, etc.
> 
> Although I find this tool pretty useful, it definitely has some 
> glaring issues.
> 
> The most glaring to me is that it really isn't useful at all for 
> debugging unparse. The data dumps only include then main outputstream, 
> so determine things like suspensions and buffered output is impossible.
> 
> Another issue is the infoset output. When outputting the infoset, the 
> debugger currently just walks the entire thing and converts it to XML 
> and displays the XML. For large infosets, this is excess and can make 
> it impossible to use, even with some configurations the limit how much 
> of that infoset is actually printed to the screen. Also things like 
> large hex binary blobs create excessive and unusable output.
> 
> Another thing I feel is missing is a schema view. Right now it's very 
> difficult to know where in the schema Daffodil actually is.
> 
> I think these issues just need some thought improvement. One could 
> imagine a better way to stringify our unparse buffers for debug. One 
> could image a way to receive infoset state changes so the debugger can 
> track things like backtracks and remove infosets. One could image a 
> way display the schema
> 
> We just need a better way to stringify the current state of the 
> unparse data including buffers, and we need a way to for the debugger 
> to receive state change information about infoset so it can update 
> displays rather than just constantly printing the entire infoset.
> 
> However, I think another other big issue is just usability in general. 
> I think the CLI usage is reasonable, but it's not always user 
> friendly, and is difficult to view multiple things at the same time. I 
> think because of this very few people even use this tool. So this this 
> like perhaps something worth focus.
> 
> My first thought to improving this usability issue would be to 
> implement the Debug Adapter Protocol (DAP)
> (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0)
>  for Daffodil, which many IDE's implement. With this implemented, Daffodil 
> could be plugged in to any IDE that supports it and essentially get debugging 
> for free, without the need to worry about the GUI elements.
> 
> I do have concerns that this just wouldn't have enough functionality 
> that we'd really need. For example, DAP really only has ability show 
> code (Daffodil's equivalent is the DFDL schema). There isn't a way to 
> show a live view of the infoset or data. Most DAP IDE's do have a 
> console output, so we could potentially make it so the console output 
> is a live view of infoset/data. But I'm not even sure most DAP 
> friendly IDE's could support this kindof console output. Does anyone 
> have familiarity with DAP IDE's or and what kinds of console 
> capabilities are available?
> 
> I also looked into TUI libraries with the idea that we could just 
> extend our current debugger user interface to be a bit friendlier.
> Unfortunately, there aren't too many Java/Scala TUI libraries and 
> those that do exist don't have Apache friendly licenses. We also want 
> to be careful about increase dependencies just for a debugger than 
> many people might not use, so large graphics libraries are probably out of 
> the question.
> 
> This allo makes me wonder if an approach worth taking for the future 
> of Daffodil schema debugging is developing a sort of "Daffodil Debug 
> Protocol". I imagine it would be loosely based on DAP (which is 
> essentially JSON message based) but could be targeted to the things 
> that a DFDL schema debugger would really need. An added benefit with 
> some sort of protocol is the debugger interface can be uncoupled from 
> Daffodil itself, so we could implement a TUI/GUI/whatever in any 
> language/GUI framework and just have it communicate the protocol over 
> some form of IPC. Another benefit is that any future backends could 
> implement this protocol and so a single debugger could hook into 
> different backends without much issue. Unfortunately, defining such a 
> protocol might be a large task, but we do have our existing debug 
> infrastructure and things like DAP to guide its development/design.
> 
> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps 
> we really just need the few improvements mentioned to the existing 
> debugger. Is that enough to make it usable? Or is an entirely 
> different approach needed to debugging schemas?
>

RE: The future of the daffodil DFDL schema debugger?

Reply via email to