[ 
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563743#action_12563743
 ] 

Steven Rowe commented on LUCENE-1157:
-------------------------------------

bq. there are unidentifiable characters in Changes.html. They are also in 
CHANGES.txt. I'm sure I read something about why they are added but cannot find 
it now.

The first three bytes of CHANGES.txt are a UTF-8 BOM (byte-order mark).  In 
Unicode's fixed-width encodings, e.g. UTF-16, the character U+FEFF is reserved 
for the beginnings of streams to denote the endian-ness of the character 
serialization.

UTF-8 is non-endian (invariant byte order given a character); the use of the 
BOM in UTF-8, where it is serialized as three bytes, is solely to indicate that 
the encoding of the stream is UTF-8.

Microsoft's tools like to put BOMs at the beginnings of UTF-8 encoded files.

> Formatable changes log  (CHANGES.txt is easy to edit but not so friendly to 
> read by Lucene users)
> -------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1157
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1157
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Website
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.4
>
>         Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch, 
> lucene-1157.patch
>
>
> Background in http://www.nabble.com/formatable-changes-log-tt15078749.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to