[
https://issues.apache.org/jira/browse/DAFFODIL-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956339#comment-17956339
]
Steve Lawrence commented on DAFFODIL-1559:
------------------------------------------
Seems reasonable to me.
The only thing could maybe be a little confusing is that some remappings are
sortof infoset-type centric. For example, if you have
dfdlx:infosetStringRemap="Xml1.0IllegalRemapDropCR", but you want to use a JSON
infoset, you're going drop CR's and remap characters to PUA, even though those
characters work perfectly fine and losslessly with a JSON infoset. So setting
this property value in a schema sort-of implies the schema is designed to be
used with an XML infoset. And if you want to use it with a different infoset,
like json, to get best results you probably would want to change the schema so
this property matches the infoset type.
Maybe a possible alternative, similar to a new property, is to use the
dfdlx:runtimeProperties extension, and add a new runtime property that says
what kind of remapping to do. This moves the remap logic back into infoset
inputters/outputters, so we do lose some of the mentioned benefits, but it does
mean infoset outputters get to control how remapping is done. So the json
infoset outputter could just ignore this property and give you a result that
json users would expect, since json has reasonable escaping rules.
It's sort of like how stringAsXML works. If that is set and the infoset
outputter is XML, then we convert it XML nodes. But if the infoset outputter is
JSON then we just keep it as normal json string.
> Add option to disable CRLF to LF XML canonicalization
> -----------------------------------------------------
>
> Key: DAFFODIL-1559
> URL: https://issues.apache.org/jira/browse/DAFFODIL-1559
> Project: Daffodil
> Issue Type: Improvement
> Components: API
> Reporter: Steve Lawrence
> Priority: Major
> Labels: beginner
> Fix For: 4.0.0
>
>
> See the review or more details. The short of it is that when converting parse
> results to XML, we convert CR to LF, and we convert CRLF to LF. This means
> that we lose the information that the data used to contain CRLF. This is
> similar to how we lose that information with delimiters if someone uses NL,
> but it's slightly different since it is actual data. However, it's most user
> friendly and consistent with other XML technologies to have this behavior.
> Perhaps we need an option to convert CRLF to somewhere in PUA so that this
> information can be maintained if someone needs it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)