What other features could find a nice home in an IDE integration? Having single convenient entrypoint (the IDE) for such things would be nice, imo.
Things like... - Rich set of actions for TDML - Run a single test from a TDML file - Debug/Run TDML - Run/Debug a data file with a schema from the project - ie Right click on a JPG and have context menu for Run with Daffodil -> pick from list of dfdl.xsd ... On Fri, Jan 8, 2021 at 2:47 PM Beckerle, Mike <[email protected]> wrote: > Use cases or quasi-requirements. This is my summary so far. > > 1) capture a human-readable trace of parse/unparse information to a single > text file (might be same as 2 if machine-readable is sufficiently human > readable) > > 2) capture a machine-readable trace of parse/unparse information to a > single text file (might be same as 1 if human readable form is also machine > readable) > > 3) interactive debug from a command line - each display of information is > requested by a specific command (1 and 2 above might be using this with a > specific canned set of commands auto-issued to display various information, > and capturing all to an output stream) > > 4) interactive debug with multi-panel display where displays are > updated/animated automatically as debug context changes. (This is intended > to mean more than just opening all the schema files in different editor > windows - more than just gdb-style debug under Emacs.) > > 5) interactive debug time-machine - ability to backup to prior > parser/unparser states, move forward again, or just backup and re-check > something, but then jump forward to proceed from where one left off. > > 6) Non Use Case: IDE for DFDL with rich semantic model (akin to the DSOM > object model) of the schema. > This is here just to point out that it's really out of scope. There are > many questions about the schema (e.g., "can I add this property to this > element?") that are not? required for the debugger. A full and powerful IDE > is great, but that's really entirely different than our goals for debugging > that we're trying to discuss here. > > > ________________________________ > From: Sloane, Brandon <[email protected]> > Sent: Thursday, January 7, 2021 1:25 PM > To: [email protected] <[email protected]> > Subject: Re: The future of the daffodil DFDL schema debugger? > > We could also create a new flag for --trace that would format the trace > output in a more machine readable manner. This should let us accomplish > Larry's goals, and most of mine, with relativly little effort within > Daffodil (but still all the effort on the GUI side), and would allow for > off-site analysis in cases where it is not practical to attach a debugger > while Daffodil is running. > ________________________________ > From: Sloane, Brandon <[email protected]> > Sent: Thursday, January 7, 2021 1:21 PM > To: [email protected] <[email protected]> > Subject: Re: The future of the daffodil DFDL schema debugger? > > I've been thinking about a tool along similar lines (although more > integrated with Daffodil than post-processing the trace output). > > One thing to keep in mind is that, although the trace output is presented > as a linear log (since we do not have much choice), the actual process is > more of a tree, due to backtracking. > > Ideally, we would have a multi-pane window showing: > > > * The hex/binary data > * The infoset > * A time-axis parse tree; with a "major" node at every point of > uncertainty and parse error, and "minor" nodes at every parse step > * A view of the DFDL schema > * An interactive terminal debugger (e.g. what we currently have) > * Breakpoints/variables/delimeter-stack/etc > > Within these panes, you ough to be able to select a given region/element, > and highlight all the corresponding elements in the other panes. > > I think that exporting the nessasary information from Daffodil to > implement all of this would be relativly straightforward. The only > potentially problametic parts I see are: > > * The interactive debugger would require some form of time-travel to > implement (I think most of the work for this is done to support backracking) > * The memory requirements when used on large infosets > > ________________________________ > From: Larry Barber <[email protected]> > Sent: Thursday, January 7, 2021 1:08 PM > To: [email protected] <[email protected]> > Subject: RE: The future of the daffodil DFDL schema debugger? > > When I was doing strange and unusual things with DFDL and generating a lot > of errors, I envisioned how helpful it would be to have a tool that would > post-process the --trace output and use it to display a dual pane window > (like the editor referenced below) with the schema on one side and hex > version on the other, with a slider that would allow be to flow through the > parsing action and see pointers as to where the parser was in both the > schema and input files. In other words just convert the information from > the -trace into a more useful graphical display. > Perhaps breakpoint like markers could be added to both files to quickly > scan through and display what sections of the schema read which locations > in the file, or vice versa. > > -----Original Message----- > From: Steve Lawrence [mailto:[email protected]] > Sent: Wednesday, January 6, 2021 1:42 PM > To: [email protected] > Subject: Re: The future of the daffodil DFDL schema debugger? > > Yep, something like that seems very reasonable for dealing with large > infosets. But it still feels like we still run into usability issues. > For example, what if a user wants to see more? We need some configuration > options to increase what we've ellided. It's not big, but every new thing > that needs configuration adds complexity and decreases usability. > > And I think the only reason we are trying to spend effort elliding things > is because we're limited to this gdb-like interface where you can only > print out a little information at a time. > > I think what would really is to dump this gdb interface and instead use > multiple windows/views. As a really close example to what I imagine, I > recently came across this hex editor: > > > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&reserved=0 > > The screenshots are a bit small so it's not super clear, but this tool has > one view for the data in hex, and one view for a tree of parsed results > (which is very similar to our infoset). The "infoset" view has information > like offset/length/value, and can be related back to the data view to find > the actual bits. > > I imagine the "next generation daffodil debugger" to look much like this. > As data is parsed, the infoset view fills up. This view could act like a > standard GUI tree so you could collapse sections or scroll around to show > just the parts you care about, and have search capabilities to quickly jump > around. The advantage here is you no longer really need automated eliding > or heuristics for what the user *might* care about. > You just show the whole thing and let user scroll around. As daffodil > parses and backtracks, this tree grows or shrinks. > > I also imagine you could have a cursor moving around the hex view, so as > daffodil moves around (e.g. scanning for delimiters, extracting integers), > one could update this data view to show what daffodil is doing and where it > is. > > I also image there could be other views as well. For example, a schema > view to show where in the schema daffodil is, and to add/remove > breakpoints. And an information view for things like variables, in-scope > delimiters, PoU's, etc. > > The only reason I mention a debug protcol is that would allow this GUI to > be more easily written in something other that Java/Scala to take advantage > of other GUI toolkits. It's been a long while since I've done anything with > Java guis, but they seems pretty poor that last I looked at them. Would > even allow for a TUI, which Java has little/no support for. Also enables > things like remote deubgging if an socket IPC was used. Though I'm not sure > all of that is necessary. Just thinking what would be ideal, and it can > always be pared back. > > > On 1/6/21 12:44 PM, Beckerle, Mike wrote: > > I don't think of it as a daffodil debug protocol, but just a separation > of concerns between display of information and the behaviors of > parse/unparse that need to be points where users can pause, and data > structures available to display. > > > > E.g., it is 100% a display issue that the infoset (shown as XML) is > clumsy, too big, etc. The infoset is available in the processor state, and > one can examine the current node, enclosing node, prior sibling(s), > following sibling(s), etc. One can elide contents that are too big for > hexBinary, etc. > > > > I think this problem, how to display the infoset with sensible limits on > sizing, is fairly easy to come up with some design for, that will at least > be (1) always fairly small (2) much more useful in more cases. It won't be > perfect but can be much better than what we do now. > > > > One sensible display "mode" should be that displaying the context > > surrounding the current element (when parsing or unparsing) displays > > at most N-lines. (N/2 before, N/2 after) with a maximum length of L > > characters (settable within reason ?) > > > > Sibling and enclosing nodes would be displayed eliding their contents to > at most 1 line. > > > > Here's an example of what I mean. Displaying up to M=10 lines total: > > > > ... > > <enclosingParent1> > > ... > > <priorSibling2>89ab782 ...</...> > > <priorSibling1>some text is here and some more text</...> > > <currentNode>value might be some big thing which needs to be elided > ...</...> > > <followingSibling1> ... </...> > > ??? > > </enclosingParent1> > > ??? > > > > The </...> is just an idea to reduce XML matching end-tag clutter. > > > > The ... on a line alone or where element content would appear generally > means 1 or more other siblings. The way the display above starts with ... > means that this is a relative inner nest, not starting from the absolute > root. > > > > The ... within simple content means that content is elided to fit on one > line. Always follows some text characters to differentiate from the > child-element context. > > > > The ??? means zero or more other siblings. > > > > I used bold italic above to point out that the current node would be > highlighted somehow. Probably a way to do this that doesn't require display > modes would be useful. E.g., a text marker like ">>>" as in: > > > >>>> <currentNode>value .... </...> > > > > might be better, particularly for a trace output being dumped to a text > file. > > > > I made the above example an unparser kind of example by showing a > following sibling that exists that is after the current node. > > > > I think the key concept is that any sibling node is displayed in a way > that fits on one line. > > E.g., even if the element name was really long, I'd suggest: > > > > <hereIsAnElementWithASuperLongName...>abcd ... </...> > > > > Where the element name itself gets elided because it is too long. > > > > A thought. Note that the above presentation is shown as quasi-XML, but > there's nothing XML-specific about it. A JSON-friendly equivalent could be > done as well: > > > > enclosingParent1 = { > > ... > > priorSibling2 = "89ab782..." > > priorSibling1 = "some text is here and some more text" > > currentNode = "value might be some big thing which needs to be elided > ..." > > followingSibling1 = { ... } > > ??? > > } > > > > That's enough for 1 email thread on this debug topic. > > > > > > ________________________________ > > From: Steve Lawrence <[email protected]> > > Sent: Tuesday, January 5, 2021 2:26 PM > > To: [email protected] <[email protected]> > > Subject: The future of the daffodil DFDL schema debugger? > > > > > > Now that we're in a new year, I'd like to start a discussion about the > > Daffodil DFDL Schema debugger and how it might be improved to be more > > useful. > > > > Note that this is not the capabilities to debug Daffodil itself in > > something like Eclipse/IntelliJ, but the ability for Daffodil to > > provide enough extra information during a parse/unparse so that a > > schema developer can get an idea of what Daffodil is doing. This makes > > it easier for users (rather than developers) to determine why a schema > > isn't giving the expect parse/unparse result (either because of bad > > data or a faulty schema. > > > > The current state of the debugger is enabled by providing the --debug > > or --trace flags in the CLI. More information about that here: > > > > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf > > fodil.apache.org%2Fdebugger%2F&data=04%7C01%7Clarry.barber%40nteli > > gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047 > > 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM > > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s > > data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&reserved > > =0 > > > > This enables a TUI and commands somewhat similar to GDB, providing > > thins like breakpoints, steps, displaying the current infoset, display > > a dump of the data, etc. > > > > Although I find this tool pretty useful, it definitely has some > > glaring issues. > > > > The most glaring to me is that it really isn't useful at all for > > debugging unparse. The data dumps only include then main outputstream, > > so determine things like suspensions and buffered output is impossible. > > > > Another issue is the infoset output. When outputting the infoset, the > > debugger currently just walks the entire thing and converts it to XML > > and displays the XML. For large infosets, this is excess and can make > > it impossible to use, even with some configurations the limit how much > > of that infoset is actually printed to the screen. Also things like > > large hex binary blobs create excessive and unusable output. > > > > Another thing I feel is missing is a schema view. Right now it's very > > difficult to know where in the schema Daffodil actually is. > > > > I think these issues just need some thought improvement. One could > > imagine a better way to stringify our unparse buffers for debug. One > > could image a way to receive infoset state changes so the debugger can > > track things like backtracks and remove infosets. One could image a > > way display the schema > > > > We just need a better way to stringify the current state of the > > unparse data including buffers, and we need a way to for the debugger > > to receive state change information about infoset so it can update > > displays rather than just constantly printing the entire infoset. > > > > However, I think another other big issue is just usability in general. > > I think the CLI usage is reasonable, but it's not always user > > friendly, and is difficult to view multiple things at the same time. I > > think because of this very few people even use this tool. So this this > > like perhaps something worth focus. > > > > My first thought to improving this usability issue would be to > > implement the Debug Adapter Protocol (DAP) > > (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi > > crosoft.github.io%2Fdebug-adapter-protocol%2F&data=04%7C01%7Clarry > > .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c > > > 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&reserved=0) > for Daffodil, which many IDE's implement. With this implemented, Daffodil > could be plugged in to any IDE that supports it and essentially get > debugging for free, without the need to worry about the GUI elements. > > > > I do have concerns that this just wouldn't have enough functionality > > that we'd really need. For example, DAP really only has ability show > > code (Daffodil's equivalent is the DFDL schema). There isn't a way to > > show a live view of the infoset or data. Most DAP IDE's do have a > > console output, so we could potentially make it so the console output > > is a live view of infoset/data. But I'm not even sure most DAP > > friendly IDE's could support this kindof console output. Does anyone > > have familiarity with DAP IDE's or and what kinds of console > > capabilities are available? > > > > I also looked into TUI libraries with the idea that we could just > > extend our current debugger user interface to be a bit friendlier. > > Unfortunately, there aren't too many Java/Scala TUI libraries and > > those that do exist don't have Apache friendly licenses. We also want > > to be careful about increase dependencies just for a debugger than > > many people might not use, so large graphics libraries are probably out > of the question. > > > > This allo makes me wonder if an approach worth taking for the future > > of Daffodil schema debugging is developing a sort of "Daffodil Debug > > Protocol". I imagine it would be loosely based on DAP (which is > > essentially JSON message based) but could be targeted to the things > > that a DFDL schema debugger would really need. An added benefit with > > some sort of protocol is the debugger interface can be uncoupled from > > Daffodil itself, so we could implement a TUI/GUI/whatever in any > > language/GUI framework and just have it communicate the protocol over > > some form of IPC. Another benefit is that any future backends could > > implement this protocol and so a single debugger could hook into > > different backends without much issue. Unfortunately, defining such a > > protocol might be a large task, but we do have our existing debug > > infrastructure and things like DAP to guide its development/design. > > > > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps > > we really just need the few improvements mentioned to the existing > > debugger. Is that enough to make it usable? Or is an entirely > > different approach needed to debugging schemas? > > > >
