Very undesirable to have   instead of the literal character.
I'm not seeing where in Daffodil this would be happening. We do remap certain characters to the private use area - XML illegal character. But there is nothing illegal about U+00A0. It's just a character. I am puzzled. Some of the infoset outputters call scala.xml.Utility.escape(...), others don't, which is itself an issue, but I tested this, and that doesn't convert U+00A0 into the   that you are observing. Nor does our remapping call. ________________________________ From: Sloane, Brandon <bslo...@tresys.com> Sent: Monday, June 24, 2019 10:38:10 AM To: dev@daffodil.apache.org Subject: Re: Character Encodings - No Statement Slightly different issue from what I was expecting. Daffodil appears to be output U+00A0 as " " instead of as a literal character. This is not wrong, and I believe a compliant XML processor should not notice the difference, but is this desireable? Additionally, it appears to not be simply a padding character. In my test data, I observed the string: "ADP ADP ". ________________________________ From: Beckerle, Mike <mbecke...@tresys.com> Sent: Tuesday, June 18, 2019 9:28:07 AM To: dev@daffodil.apache.org Subject: Re: Character Encodings - No Statement One other possible mechanism: nillable elements with dfdl:nilKind="literalCharacter" This is a mechanism designed to handle fixed-length data where the "storage" for the data is filled with a character/byte and the parts of it that are in-use are overwritten with actual data. The unwritten data is then recognized as nilled based on appearance of the literalCharacter throughout the data field. The only thing that bugs me about this is that XSD doesn't allow nillable="true" as part of a type definition, you have to put in on an element declaration, which means you can't abstract over it without committing to some element name. I have the same complaint about dimensionality - tied to elements therefore to element names. ________________________________ From: Sloane, Brandon <bslo...@tresys.com> Sent: Monday, June 17, 2019 5:43:23 PM To: dev@daffodil.apache.org Subject: Re: Character Encodings - No Statement The field it occurs in is fixed-length, so a padding character makes sense. I am a bit concerned about implications of using a character that looks like a space. This type of character homophone seems like a potential source of errors for people using the schema. Assuming we are correct that this character in intended as padding, we can probably avoid this issue by advising schema writers to specify U+A0 as a padding character, so it doesn't actually make it into the infoset. ________________________________ From: Beckerle, Mike <mbecke...@tresys.com> Sent: Monday, June 17, 2019 5:17:20 PM To: dev@daffodil.apache.org Subject: Re: Character Encodings - No Statement This sounds like fixed length data fields, or min-length data fields. So the character to use wants to be similar in concept to the pad character - i.e., it is used to add length to a fixed length field, but has no significance. I suggest using U+A0 which is "Non Break Space". This is a space for all practical purposes, differing only in how it is treated by hyphenation algorithms. Using this instead of regular space will allow this data to round-trip. This character should render like a space in every unicode-aware context. ________________________________ From: Sloane, Brandon <bslo...@tresys.com> Sent: Monday, June 17, 2019 4:55:09 PM To: dev@daffodil.apache.org Subject: Character Encodings - No Statement I am going through link16 (mil-std-6016e, not publically available) to add support for some of the special character encodings to Daffodil (simmilar to dfi264:dui001 that has already been added). While doing so, I came across DFI 311 DUI 002. Several bitcodes are "UNDEFINED", which I intend to translate into U+FFFD ('�' replacement character), which is what we are doing for 264:001. However, there is also an explicit coding for a NO STATEMENT character. Any insight in what a reasonable choice for translating NO STATEMENT to unicode is? Regards, Brandon T. Sloane Associate, Services bslo...@tresys.com | tresys.com