Re: Output SVRL from Schematron Validator
Thanks. > Do we need API-level access to this? E.g. in SAPI/JAPI? I would imagine so. Yeah good call, Ill add it. On Mon, Apr 5, 2021 at 1:31 PM Beckerle, Mike wrote: > I looked at the PR for this feature. I think it's fine to have the CLI > provide an option with a file to write it to, and API-wise, if we decide we > have to expose this, then a parseResult.validationResult.raw member, or > like that, is fine with me. > > Do we need API-level access to this? E.g. in SAPI/JAPI? I would imagine so. > > From: John Wass > Sent: Monday, March 29, 2021 1:55 PM > To: dev@daffodil.apache.org > Subject: Re: Output SVRL from Schematron Validator > > The thought with the OutputStream was it would be dumped directly to a file > or log or stdout, definitely more of a logging effect than for more > processing, since the structured results from a validator are already > returned as ValidationResult. That idea looks and sounds worse today that > it did initially. > > > What about if each ParseResult has a member > > Ah, what if the ParseResult hangs on to the ValidationResult and makes it > accessible that way? > > def validationResult(): Option[ValidationResult] > > To support this ValidationResult would become a trait which lets validator > implementations attach custom data and interfaces to the result, which > clients can get to through the ParseResult accessor. > > Something like this; > https://github.com/jw3/daffodil/tree/validator_result_refactor > > Thoughts? > > > On Fri, Mar 26, 2021 at 10:30 AM Steve Lawrence > wrote: > > > What about if each ParseResult has a member that's something like > > > > val validationData: Option[AnyRef] > > > > Each validator can optionally return some validation data which is then > > store in this member. The user could then access this validation data > > through the ParseResult and cast it to what it should be, as documented > > by the validator. > > > > This allows each validator a way provide whatever additional data they > > want in whatever format makes the most sense for them. > > > > There's the downside that a user needs to know how to cast this AnyRef > > based on which validator was used. But a similar issue exists if this is > > just an InputStream--you still need to know how to interpret that > > InputStream data. But with this approach, it lets a Validator return > > complex structures that can provide richer information than an > > InputStream could. > > > > On 3/26/21 10:16 AM, John Wass wrote: > > > Reference implementation here > > > https://github.com/jw3/daffodil/tree/validator_outputstream > > > > > > Currently has changes sketched in from the parse result on down. Need > to > > > wire things in through DP and CLI yet. > > > > > > Haven't thought of an alternative that works yet. > > > > > > > > > On Tue, Mar 23, 2021 at 12:59 PM John Wass wrote: > > > > > >> Looking at DAFFODIL-2482 that came up due to a gap that's blocking > > >> integration of the schematron validation functionality into some > > workflows > > >> that require the full SVRL output, not just the pass/fail status. > > >> > > >> So what needs to happen here is the SVRL that we currently just parse > > for > > >> errors and discard needs to be output in a predictable way. I've > tried a > > >> couple things intent on minimizing the footprint of the impl but > coming > > up > > >> empty mainly due to violating the reusable validator principle. > > >> > > >> So another unminimized approach would be to provide an additional > stream > > >> to all validators for raw output to be written, the implementation of > > that > > >> stream is determined by configuration from the DataProcessor. The new > > >> output stream is passed at validation-time, which requires changing > the > > >> signature of the validate call to accept this output stream in > addition > > to > > >> the existing input stream (or we could add another interface, but I'm > > not > > >> convinced of the usefulness of that currently). > > >> > > >> Looking for some thoughts on this approach. > > >> > > >> > > >> [1] https://issues.apache.org/jira/browse/DAFFODIL-2482 > > >> > > >> > > > > > > > >
Re: Output SVRL from Schematron Validator
I looked at the PR for this feature. I think it's fine to have the CLI provide an option with a file to write it to, and API-wise, if we decide we have to expose this, then a parseResult.validationResult.raw member, or like that, is fine with me. Do we need API-level access to this? E.g. in SAPI/JAPI? I would imagine so. From: John Wass Sent: Monday, March 29, 2021 1:55 PM To: dev@daffodil.apache.org Subject: Re: Output SVRL from Schematron Validator The thought with the OutputStream was it would be dumped directly to a file or log or stdout, definitely more of a logging effect than for more processing, since the structured results from a validator are already returned as ValidationResult. That idea looks and sounds worse today that it did initially. > What about if each ParseResult has a member Ah, what if the ParseResult hangs on to the ValidationResult and makes it accessible that way? def validationResult(): Option[ValidationResult] To support this ValidationResult would become a trait which lets validator implementations attach custom data and interfaces to the result, which clients can get to through the ParseResult accessor. Something like this; https://github.com/jw3/daffodil/tree/validator_result_refactor Thoughts? On Fri, Mar 26, 2021 at 10:30 AM Steve Lawrence wrote: > What about if each ParseResult has a member that's something like > > val validationData: Option[AnyRef] > > Each validator can optionally return some validation data which is then > store in this member. The user could then access this validation data > through the ParseResult and cast it to what it should be, as documented > by the validator. > > This allows each validator a way provide whatever additional data they > want in whatever format makes the most sense for them. > > There's the downside that a user needs to know how to cast this AnyRef > based on which validator was used. But a similar issue exists if this is > just an InputStream--you still need to know how to interpret that > InputStream data. But with this approach, it lets a Validator return > complex structures that can provide richer information than an > InputStream could. > > On 3/26/21 10:16 AM, John Wass wrote: > > Reference implementation here > > https://github.com/jw3/daffodil/tree/validator_outputstream > > > > Currently has changes sketched in from the parse result on down. Need to > > wire things in through DP and CLI yet. > > > > Haven't thought of an alternative that works yet. > > > > > > On Tue, Mar 23, 2021 at 12:59 PM John Wass wrote: > > > >> Looking at DAFFODIL-2482 that came up due to a gap that's blocking > >> integration of the schematron validation functionality into some > workflows > >> that require the full SVRL output, not just the pass/fail status. > >> > >> So what needs to happen here is the SVRL that we currently just parse > for > >> errors and discard needs to be output in a predictable way. I've tried a > >> couple things intent on minimizing the footprint of the impl but coming > up > >> empty mainly due to violating the reusable validator principle. > >> > >> So another unminimized approach would be to provide an additional stream > >> to all validators for raw output to be written, the implementation of > that > >> stream is determined by configuration from the DataProcessor. The new > >> output stream is passed at validation-time, which requires changing the > >> signature of the validate call to accept this output stream in addition > to > >> the existing input stream (or we could add another interface, but I'm > not > >> convinced of the usefulness of that currently). > >> > >> Looking for some thoughts on this approach. > >> > >> > >> [1] https://issues.apache.org/jira/browse/DAFFODIL-2482 > >> > >> > > > >
Re: XML String in Binary Data Question
I will create the test case as you suggest, illustrating the whole situation and what Daffodil does today. What I'm seeking is a way for the string bar to be rendered as a string as exactly those characters, so that we *fool* a subsequent XML validator into treating the string contents as a tree of well-formed XML elements. An XML schema for the resulting data would not have type xs:string for the myString element, but a complex type containing a "foo" child element. XPaths like myString/foo would be meaningful in this data. Arguably, DFDL should not do this, rather, a post-processor of the XML-rendered infoset should do this XML-specific transformation. The analogous situation does also occur for JSON. (Though nobody has asked for this as yet.) The string { "foo" : "bar" } as a string value of a JSON field named "myString" would require a bunch of escaping. E.g., perhaps (I don't know JSON so well) like "myString" : "\{ \"foo\": \"bar\" \"" This will be interesting to test. From: Interrante, John A (GE Research, US) Sent: Monday, April 5, 2021 7:36 AM To: dev@daffodil.apache.org Subject: RE: XML String in Binary Data Question I was waiting for someone to offer an opinion but it seems it's up to me. First of all, please write an actual test case of binary data with a well-formed piece of XML data inside a string. Please round trip it through Daffodil so we can actually find out how well both the parser and the unparser handle this data. I find it hard to believe Daffodil doesn't already use some escaping or quoting mechanism to handle this kind of situation where the infoset (represented as XML) contains an element whose body looks like well-formed XML elements in their own turn. Even if this situation causes the Daffodil unparser to explode, what's to stop you from telling Daffodil to represent the infoset as JSON rather than XML? Surely the Daffodil unparser wouldn't have a problem unparsing the JSON representation with XML elements inside a string element? I also would be curious to find out whether the infoset's JSON representation has a similar problem handling an actual test case of binary data with a well-formed piece of JSON data inside a string. Once we know what really happens (and we also can run the same JSON/XML test cases through IBM's Daffodil processor to get more data points), we can start to discuss what's the best solution to handle this kind of situation for both JSON and XML infoset representations automatically. John -Original Message- From: Beckerle, Mike Sent: Friday, April 2, 2021 12:50 PM To: dev@daffodil.apache.org Subject: EXT: XML String in Binary Data Question I've started running into binary data containing XML strings. If Daffodil is unparsing a piece of XML Like this: of arbitrary xml Suppose the DFDL schema for bodyString is: So the notion here is that the data contains a string, which is a well-formed piece of XML. For example, the overall format may be binary data that just happens to contain this string of XML in it. I suspect that the Daffodil unparser is just going to explode on this, because it will be fed element events for the string contents. I.e., the unparsing converts the incoming XML text to infoset events by first parsing it as XML, and that process is schema-unaware, so has no notion that the XML parse should NOT parse the parts of the body string as XML elements. Does it make sense for Daffodil's XML-text infoset importer (used by unparsing) to recognize this case, and convert the of arbitrary xml into an escapified XML string like: ns:well formed=pieceof arbitrary xml/ns:well and then unparse it as if that string had arrived as this XML event to the unparser/XML-text Infoset inputter: ns:well formed=pieceof arbitrary xml/ns:well So would an option to have this behavior be a reasonable thing to add to Daffodil? The corresponding parse feature would be to emit the string not as escapified XML, but just as a string of text of well-formed XML. I guess the notion is that escapifying strings is because the string contents may not be well-formed XML, but in this case since they ARE well formed pieces of XML, when a string is required we can emit unescapified XML, and also consume the same for unparsing and convert into strings. Thoughts?
RE: XML String in Binary Data Question
I was waiting for someone to offer an opinion but it seems it's up to me. First of all, please write an actual test case of binary data with a well-formed piece of XML data inside a string. Please round trip it through Daffodil so we can actually find out how well both the parser and the unparser handle this data. I find it hard to believe Daffodil doesn't already use some escaping or quoting mechanism to handle this kind of situation where the infoset (represented as XML) contains an element whose body looks like well-formed XML elements in their own turn. Even if this situation causes the Daffodil unparser to explode, what's to stop you from telling Daffodil to represent the infoset as JSON rather than XML? Surely the Daffodil unparser wouldn't have a problem unparsing the JSON representation with XML elements inside a string element? I also would be curious to find out whether the infoset's JSON representation has a similar problem handling an actual test case of binary data with a well-formed piece of JSON data inside a string. Once we know what really happens (and we also can run the same JSON/XML test cases through IBM's Daffodil processor to get more data points), we can start to discuss what's the best solution to handle this kind of situation for both JSON and XML infoset representations automatically. John -Original Message- From: Beckerle, Mike Sent: Friday, April 2, 2021 12:50 PM To: dev@daffodil.apache.org Subject: EXT: XML String in Binary Data Question I've started running into binary data containing XML strings. If Daffodil is unparsing a piece of XML Like this: of arbitrary xml Suppose the DFDL schema for bodyString is: So the notion here is that the data contains a string, which is a well-formed piece of XML. For example, the overall format may be binary data that just happens to contain this string of XML in it. I suspect that the Daffodil unparser is just going to explode on this, because it will be fed element events for the string contents. I.e., the unparsing converts the incoming XML text to infoset events by first parsing it as XML, and that process is schema-unaware, so has no notion that the XML parse should NOT parse the parts of the body string as XML elements. Does it make sense for Daffodil's XML-text infoset importer (used by unparsing) to recognize this case, and convert the of arbitrary xml into an escapified XML string like: ns:well formed=pieceof arbitrary xml/ns:well and then unparse it as if that string had arrived as this XML event to the unparser/XML-text Infoset inputter: ns:well formed=pieceof arbitrary xml/ns:well So would an option to have this behavior be a reasonable thing to add to Daffodil? The corresponding parse feature would be to emit the string not as escapified XML, but just as a string of text of well-formed XML. I guess the notion is that escapifying strings is because the string contents may not be well-formed XML, but in this case since they ARE well formed pieces of XML, when a string is required we can emit unescapified XML, and also consume the same for unparsing and convert into strings. Thoughts?