While working on a new xml implementation I came cross "control characters (CC)". [1] When trying to validate/convert an utf string these lead to exceptions, because they are not valid utf character. Unfortunately, some of these characters are allowed to appear in valid xml 1.* documents.

I currently see two option how to go about it:

1. Do not allow non CCs that do not work with existing functionality.
1.Pros
  * easy
1.Cons
  * the resulting xml implementation will not be xml 1.* complete

2. Add special cases to the existing functionality to handle CCs that are allowed in 1.0.
2.Pros
  * the resulting xml implementation will be xml 1.* complete
2.Cons
* will make utf de/encoding slower as I would need to add additional logic

Any other ideas, feedback?




[1] https://en.wikipedia.org/wiki/C0_and_C1_control_codes

Reply via email to