Re: [jclouds/jclouds] Consume Unicode byte order mark in XML parser (#1124)

The change was surprising to me, I expected Java to handle the BOM. After 
digging deeper I found 
[[1]](https://stackoverflow.com/questions/4897876/reading-utf-8-bom-marker) 
which points to two JDK bugs 
[[2]](http://bugs.java.com/view_bug.do?bug_id=4508058) and 
[[3]](http://bugs.java.com/view_bug.do?bug_id=6378911). Turns out they fixed it 
at some point to consume the BOM but then reverted because it breaks backwards 
compatibility.


Also of interest, the UTF character `0xFEFF` is serialized as `EF BB BF` in the 
UTF-8 byte sequence [[4]](http://www.unicode.org/faq/utf_bom.html#BOM)

\[1\] https://stackoverflow.com/questions/4897876/reading-utf-8-bom-marker
\[2\] http://bugs.java.com/view_bug.do?bug_id=4508058
\[3\] http://bugs.java.com/view_bug.do?bug_id=6378911
\[4\] http://www.unicode.org/faq/utf_bom.html#BOM

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/1124#issuecomment-318299404

Re: [jclouds/jclouds] Consume Unicode byte order mark in XML parser (#1124)

Reply via email to