> lives in daffodil repo (new subproject?) Not asking a question here, meant to snip out those parens.
The daffodil-debug-api and any daffodil-debug-io-NAME projects do represent new subprojects. Just wanted to clarify, never see those things till send is hit. On Thu, Apr 8, 2021 at 12:36 PM John Wass <[email protected]> wrote: > Revisiting this post after doing some debugger related work and thinking > about debug protocol/adapters to connect external tooling to the debug > process. > > This comment is good > > > This allo makes me wonder if an approach worth taking for the future of > Daffodil schema debugging is developing a sort of "Daffodil Debug Protocol". > I imagine it would be loosely based on DAP (which is essentially JSON > message based) but could be targeted to the things that a DFDL schema > debugger would really need. An added benefit with some sort of protocol > is the debugger interface can be uncoupled from Daffodil itself, so we > could implement a TUI/GUI/whatever in any language/GUI framework and just > have it communicate the protocol over some form of IPC. Another benefit > is that any future backends could implement this protocol and so a single > debugger could hook into different backends without much issue. > Unfortunately, defining such a protocol might be a large task, but we do > have our existing debug infrastructure and things like DAP to guide its > development/design. > > Some thoughts on this > - Defining the protocol will be a large task, but a minimal version should > get up and round tripping quickly with a minimal subset of the protocol. > - The new protocol being informed by existing debugger and DAPis key > - Uncoupling from Daffodil is key > - Adapt the Daffodil protocol to produce DAP after the fact so as not to > constrain Daffodil debugging capability > - We dont need to tie the protocol or adapters to a single framework, > implementations of the IO layer should be simple enough to support multiple > things (eg Akka, Zio, "basic" ...) > - The current debugger lives in runtime1, but can we make an abstract API > that any runtime would implement? > > Maybe a solution is structured like this > - daffodil-debug-api: > - protocol model > - interfaces: debugger / IO adapter / etc > - lives in daffodil repo (new subproject?) > - daffodil-debug-io-NAME > - provides implementation of a specific IO adapter > - multiple projects possible (daffodil-debugger-akka, > daffodil-debugger-zio, etc) > - supported ones live in their own subprojects, but other can be plugged > in from external sources > - ability to support multiple implementations reduces risk of lock-in > - debugger applications > - maintained in external repositories > - depending on the IO implementation these could execute be in separate > process or on separate machine > - like Steve said, could be any language / framework > > Three types of reference implementations / sample applications could also > guide the development of the API > 1. a replacement for the existing TUI debugger, expected to end up with > at minimum the same functionality as the current one. > 2. a standalone GUI (JavaFX, Scala.js, ..) debugger > 3. an IDE integration > > Thoughts? > > Also I'm working on some reference implementations of these concepts using > Akka and Zio. Not quite ready to talk through it yet, but the code is here > https://github.com/jw3/example-daffodil-debug > > > > On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <[email protected]> > wrote: > >> Yep, something like that seems very reasonable for dealing with large >> infosets. But it still feels like we still run into usability issues. >> For example, what if a user wants to see more? We need some >> configuration options to increase what we've ellided. It's not big, but >> every new thing that needs configuration adds complexity and decreases >> usability. >> >> And I think the only reason we are trying to spend effort elliding >> things is because we're limited to this gdb-like interface where you can >> only print out a little information at a time. >> >> I think what would really is to dump this gdb interface and instead use >> multiple windows/views. As a really close example to what I imagine, I >> recently came across this hex editor: >> >> https://www.synalysis.net/ >> >> The screenshots are a bit small so it's not super clear, but this tool >> has one view for the data in hex, and one view for a tree of parsed >> results (which is very similar to our infoset). The "infoset" view has >> information like offset/length/value, and can be related back to the >> data view to find the actual bits. >> >> I imagine the "next generation daffodil debugger" to look much like >> this. As data is parsed, the infoset view fills up. This view could act >> like a standard GUI tree so you could collapse sections or scroll around >> to show just the parts you care about, and have search capabilities to >> quickly jump around. The advantage here is you no longer really need >> automated eliding or heuristics for what the user *might* care about. >> You just show the whole thing and let user scroll around. As daffodil >> parses and backtracks, this tree grows or shrinks. >> >> I also imagine you could have a cursor moving around the hex view, so as >> daffodil moves around (e.g. scanning for delimiters, extracting >> integers), one could update this data view to show what daffodil is >> doing and where it is. >> >> I also image there could be other views as well. For example, a schema >> view to show where in the schema daffodil is, and to add/remove >> breakpoints. And an information view for things like variables, in-scope >> delimiters, PoU's, etc. >> >> The only reason I mention a debug protcol is that would allow this GUI >> to be more easily written in something other that Java/Scala to take >> advantage of other GUI toolkits. It's been a long while since I've done >> anything with Java guis, but they seems pretty poor that last I looked >> at them. Would even allow for a TUI, which Java has little/no support >> for. Also enables things like remote deubgging if an socket IPC was >> used. Though I'm not sure all of that is necessary. Just thinking what >> would be ideal, and it can always be pared back. >> >> >> On 1/6/21 12:44 PM, Beckerle, Mike wrote: >> > I don't think of it as a daffodil debug protocol, but just a separation >> of concerns between display of information and the behaviors of >> parse/unparse that need to be points where users can pause, and data >> structures available to display. >> > >> > E.g., it is 100% a display issue that the infoset (shown as XML) is >> clumsy, too big, etc. The infoset is available in the processor state, and >> one can examine the current node, enclosing node, prior sibling(s), >> following sibling(s), etc. One can elide contents that are too big for >> hexBinary, etc. >> > >> > I think this problem, how to display the infoset with sensible limits >> on sizing, is fairly easy to come up with some design for, that will at >> least be (1) always fairly small (2) much more useful in more cases. It >> won't be perfect but can be much better than what we do now. >> > >> > One sensible display "mode" should be that displaying the context >> surrounding the current element (when parsing or unparsing) displays at >> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters >> (settable within reason ?) >> > >> > Sibling and enclosing nodes would be displayed eliding their contents >> to at most 1 line. >> > >> > Here's an example of what I mean. Displaying up to M=10 lines total: >> > >> > ... >> > <enclosingParent1> >> > ... >> > <priorSibling2>89ab782 ...</...> >> > <priorSibling1>some text is here and some more text</...> >> > <currentNode>value might be some big thing which needs to be elided >> ...</...> >> > <followingSibling1> ... </...> >> > ??? >> > </enclosingParent1> >> > ??? >> > >> > The </...> is just an idea to reduce XML matching end-tag clutter. >> > >> > The ... on a line alone or where element content would appear generally >> means 1 or more other siblings. The way the display above starts with ... >> means that this is a relative inner nest, not starting from the absolute >> root. >> > >> > The ... within simple content means that content is elided to fit on >> one line. Always follows some text characters to differentiate from the >> child-element context. >> > >> > The ??? means zero or more other siblings. >> > >> > I used bold italic above to point out that the current node would be >> highlighted somehow. Probably a way to do this that doesn't require display >> modes would be useful. E.g., a text marker like ">>>" as in: >> > >> >>>> <currentNode>value .... </...> >> > >> > might be better, particularly for a trace output being dumped to a text >> file. >> > >> > I made the above example an unparser kind of example by showing a >> following sibling that exists that is after the current node. >> > >> > I think the key concept is that any sibling node is displayed in a way >> that fits on one line. >> > E.g., even if the element name was really long, I'd suggest: >> > >> > <hereIsAnElementWithASuperLongName...>abcd ... </...> >> > >> > Where the element name itself gets elided because it is too long. >> > >> > A thought. Note that the above presentation is shown as quasi-XML, but >> there's nothing XML-specific about it. A JSON-friendly equivalent could be >> done as well: >> > >> > enclosingParent1 = { >> > ... >> > priorSibling2 = "89ab782..." >> > priorSibling1 = "some text is here and some more text" >> > currentNode = "value might be some big thing which needs to be >> elided ..." >> > followingSibling1 = { ... } >> > ??? >> > } >> > >> > That's enough for 1 email thread on this debug topic. >> > >> > >> > ________________________________ >> > From: Steve Lawrence <[email protected]> >> > Sent: Tuesday, January 5, 2021 2:26 PM >> > To: [email protected] <[email protected]> >> > Subject: The future of the daffodil DFDL schema debugger? >> > >> > >> > Now that we're in a new year, I'd like to start a discussion about the >> > Daffodil DFDL Schema debugger and how it might be improved to be more >> > useful. >> > >> > Note that this is not the capabilities to debug Daffodil itself in >> > something like Eclipse/IntelliJ, but the ability for Daffodil to provide >> > enough extra information during a parse/unparse so that a schema >> > developer can get an idea of what Daffodil is doing. This makes it >> > easier for users (rather than developers) to determine why a schema >> > isn't giving the expect parse/unparse result (either because of bad data >> > or a faulty schema. >> > >> > The current state of the debugger is enabled by providing the --debug or >> > --trace flags in the CLI. More information about that here: >> > >> > https://daffodil.apache.org/debugger/ >> > >> > This enables a TUI and commands somewhat similar to GDB, providing thins >> > like breakpoints, steps, displaying the current infoset, display a dump >> > of the data, etc. >> > >> > Although I find this tool pretty useful, it definitely has some glaring >> > issues. >> > >> > The most glaring to me is that it really isn't useful at all for >> > debugging unparse. The data dumps only include then main outputstream, >> > so determine things like suspensions and buffered output is impossible. >> > >> > Another issue is the infoset output. When outputting the infoset, the >> > debugger currently just walks the entire thing and converts it to XML >> > and displays the XML. For large infosets, this is excess and can make it >> > impossible to use, even with some configurations the limit how much of >> > that infoset is actually printed to the screen. Also things like large >> > hex binary blobs create excessive and unusable output. >> > >> > Another thing I feel is missing is a schema view. Right now it's very >> > difficult to know where in the schema Daffodil actually is. >> > >> > I think these issues just need some thought improvement. One could >> > imagine a better way to stringify our unparse buffers for debug. One >> > could image a way to receive infoset state changes so the debugger can >> > track things like backtracks and remove infosets. One could image a way >> > display the schema >> > >> > We just need a better way to stringify the current state of the unparse >> > data including buffers, and we need a way to for the debugger to receive >> > state change information about infoset so it can update displays rather >> > than just constantly printing the entire infoset. >> > >> > However, I think another other big issue is just usability in general. I >> > think the CLI usage is reasonable, but it's not always user friendly, >> > and is difficult to view multiple things at the same time. I think >> > because of this very few people even use this tool. So this this like >> > perhaps something worth focus. >> > >> > My first thought to improving this usability issue would be to implement >> > the Debug Adapter Protocol (DAP) >> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil, >> > which many IDE's implement. With this implemented, Daffodil could be >> > plugged in to any IDE that supports it and essentially get debugging for >> > free, without the need to worry about the GUI elements. >> > >> > I do have concerns that this just wouldn't have enough functionality >> > that we'd really need. For example, DAP really only has ability show >> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to >> > show a live view of the infoset or data. Most DAP IDE's do have a >> > console output, so we could potentially make it so the console output is >> > a live view of infoset/data. But I'm not even sure most DAP friendly >> > IDE's could support this kindof console output. Does anyone have >> > familiarity with DAP IDE's or and what kinds of console capabilities are >> > available? >> > >> > I also looked into TUI libraries with the idea that we could just extend >> > our current debugger user interface to be a bit friendlier. >> > Unfortunately, there aren't too many Java/Scala TUI libraries and those >> > that do exist don't have Apache friendly licenses. We also want to be >> > careful about increase dependencies just for a debugger than many people >> > might not use, so large graphics libraries are probably out of the >> question. >> > >> > This allo makes me wonder if an approach worth taking for the future of >> > Daffodil schema debugging is developing a sort of "Daffodil Debug >> > Protocol". I imagine it would be loosely based on DAP (which is >> > essentially JSON message based) but could be targeted to the things that >> > a DFDL schema debugger would really need. An added benefit with some >> > sort of protocol is the debugger interface can be uncoupled from >> > Daffodil itself, so we could implement a TUI/GUI/whatever in any >> > language/GUI framework and just have it communicate the protocol over >> > some form of IPC. Another benefit is that any future backends could >> > implement this protocol and so a single debugger could hook into >> > different backends without much issue. Unfortunately, defining such a >> > protocol might be a large task, but we do have our existing debug >> > infrastructure and things like DAP to guide its development/design. >> > >> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we >> > really just need the few improvements mentioned to the existing >> > debugger. Is that enough to make it usable? Or is an entirely different >> > approach needed to debugging schemas? >> > >> >>
