[jira] [Commented] (DAFFODIL-1559) Add option to disable CRLF to LF XML canonicalization

Steve Lawrence (Jira) Thu, 05 Jun 2025 08:11:09 -0700


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956339#comment-17956339
 ]


Steve Lawrence commented on DAFFODIL-1559:
------------------------------------------

Seems reasonable to me.

The only thing could maybe be a little confusing is that some remappings are 
sortof infoset-type centric. For example, if you have 
dfdlx:infosetStringRemap="Xml1.0IllegalRemapDropCR", but you want to use a JSON 
infoset, you're going drop CR's and remap characters to PUA, even though those 
characters work perfectly fine and losslessly with a JSON infoset. So setting 
this property value in a schema sort-of implies the schema is designed to be 
used with an XML infoset. And if you want to use it with a different infoset, 
like json, to get best results you probably would  want to change the schema so 
this property matches the infoset type.

Maybe a possible alternative, similar to a new property, is to use the 
dfdlx:runtimeProperties extension, and add a new runtime property that says 
what kind of remapping to do. This moves the remap logic back into infoset 
inputters/outputters, so we do lose some of the mentioned benefits, but it does 
mean infoset outputters get to control how remapping is done. So the json 
infoset outputter could just ignore this property and give you a result that 
json users would expect, since json has reasonable escaping rules.

It's sort of like how stringAsXML works. If that is set and the infoset 
outputter is XML, then we convert it XML nodes. But if the infoset outputter is 
JSON then we just keep it as normal json string.

> Add option to disable CRLF to LF XML canonicalization
> -----------------------------------------------------
>
>                 Key: DAFFODIL-1559
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1559
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: API
>            Reporter: Steve Lawrence
>            Priority: Major
>              Labels: beginner
>             Fix For: 4.0.0
>
>
> See the review or more details. The short of it is that when converting parse 
> results to XML, we convert CR to LF, and we convert CRLF to LF. This means 
> that we lose the information that the data used to contain CRLF. This is 
> similar to how we lose that information with delimiters if someone uses NL, 
> but it's slightly different since it is actual data. However, it's most user 
> friendly and consistent with other XML technologies to have this behavior.
> Perhaps we need an option to convert CRLF to somewhere in PUA so that this 
> information can be maintained if someone needs it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DAFFODIL-1559) Add option to disable CRLF to LF XML canonicalization

Reply via email to