I've started running into binary data containing XML strings.

If Daffodil is unparsing a piece of XML Like this:

<bodyString><ns:well formed="piece">of arbitrary xml</ns:well></bodyString>

Suppose the DFDL schema for bodyString is:

<element name="bodyString" type="xs:string" dfdl:lengthKind="explicit" 
dfdl:length="{....}"/>

So the notion here is that the data contains a string, which is a well-formed 
piece of XML.
For example, the overall format may be binary data that just happens to contain 
this string of XML in it.

I suspect that the Daffodil unparser is just going to explode on this, because 
it will
be fed element events for the string contents. I.e., the unparsing converts the 
incoming XML text to infoset events by first parsing it as XML, and that 
process is schema-unaware, so has no notion that the XML parse should NOT parse 
the parts of the body string as XML elements.

Does it make sense for Daffodil's XML-text infoset importer (used by unparsing) 
to recognize this case, and convert the <ns:well formed="piece">of arbitrary 
xml</ns:well> into an escapified XML string like:

&lt;ns:well formed=&quot;piece&quot;&gt;of arbitrary xml&lt;/ns:well&gt;

and then unparse it as if that string had arrived as this XML event to the 
unparser/XML-text Infoset inputter:

<bodyString>&lt;ns:well formed=&quot;piece&quot;&gt;of arbitrary 
xml&lt;/ns:well&gt;</bodyString>

So would an option to have this behavior be a reasonable thing to add to 
Daffodil?

The corresponding parse feature would be to emit the string not as escapified 
XML, but just as a string of text of well-formed XML.

I guess the notion is that escapifying strings is because the string contents 
may not be well-formed XML, but in this case since they ARE well formed pieces 
of XML, when a string is required we can emit unescapified XML, and also 
consume the same for unparsing and convert into strings.

Thoughts?

Reply via email to