[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563743#action_12563743 ]
Steven Rowe commented on LUCENE-1157: ------------------------------------- bq. there are unidentifiable characters in Changes.html. They are also in CHANGES.txt. I'm sure I read something about why they are added but cannot find it now. The first three bytes of CHANGES.txt are a UTF-8 BOM (byte-order mark). In Unicode's fixed-width encodings, e.g. UTF-16, the character U+FEFF is reserved for the beginnings of streams to denote the endian-ness of the character serialization. UTF-8 is non-endian (invariant byte order given a character); the use of the BOM in UTF-8, where it is serialized as three bytes, is solely to indicate that the encoding of the stream is UTF-8. Microsoft's tools like to put BOMs at the beginnings of UTF-8 encoded files. > Formatable changes log (CHANGES.txt is easy to edit but not so friendly to > read by Lucene users) > ------------------------------------------------------------------------------------------------- > > Key: LUCENE-1157 > URL: https://issues.apache.org/jira/browse/LUCENE-1157 > Project: Lucene - Java > Issue Type: Improvement > Components: Website > Reporter: Doron Cohen > Assignee: Doron Cohen > Fix For: 2.4 > > Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch, > lucene-1157.patch > > > Background in http://www.nabble.com/formatable-changes-log-tt15078749.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]