Re: The future of the daffodil DFDL schema debugger?

2021-05-26 Thread Beckerle, Mike
I think the point was to understand debugging in daffodil, one must understand, 
and potentially have to display, the data structures that the runtime maintains.

Furthermore, some of the actions the parser/unparser takes are universal, like 
invoking a parser. Others require finer detail than that - e.g., delimiter 
scanning certainly needs more detailed treatment from the debugger.

But first approximation is there should be some way to display, inspect, and 
potentially manipulate each piece of state.


From: John Wass 
Sent: Wednesday, May 26, 2021 2:46 PM
To: dev@daffodil.apache.org 
Subject: Re: The future of the daffodil DFDL schema debugger?

> Some thoughts re: data format debugger
> I suggest we enumerate

Mike, are you saying there is some ground work to lay for this in Daffodil
itself, or are these things which the debugger needs to model after
existing concepts.


On Mon, May 24, 2021 at 12:48 PM Beckerle, Mike <
mbecke...@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> 
> From: Steve Lawrence 
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org 
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can 

Re: The future of the daffodil DFDL schema debugger?

2021-05-26 Thread John Wass
> Some thoughts re: data format debugger
> I suggest we enumerate

Mike, are you saying there is some ground work to lay for this in Daffodil
itself, or are these things which the debugger needs to model after
existing concepts.


On Mon, May 24, 2021 at 12:48 PM Beckerle, Mike <
mbecke...@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> 
> From: Steve Lawrence 
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org 
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can take a long time. I was hoping this issue
> would get resolved with this new debugging infrastructure. When the
> infoset is modified, we ideally want a way to specify via DAP that parts
> of the variable hierarchy were added/removed rather than having to send
> the entire infoset during every variable update.
>
> 3) I can imagine a feature where a user would want to select an infoset
> item and jump to the associated schema element, or query information
> about that infoset item (e.g.. what bit position did it start at, what
> was the length). We don't have this right now, but would be really nice
> to have. This suggests that we need metadata associated with each of the
> variables.