[ 
https://issues.apache.org/jira/browse/SOLR-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865336#action_12865336
 ] 

Hoss Man commented on SOLR-1865:
--------------------------------

Robert: based on my limited understanding, aren't there different BOMs for 
different encodings? ...

http://unicode.org/faq/utf_bom.html#bom4

The getLInes method modified in your patch could (conceivably) be used to open 
files in other encodings, so do we also need to worry about those possibilities 
as well? (or does InputStreamReader take care of that for us?)

> ignore byte-order markers in SolrResourceLoader
> -----------------------------------------------
>
>                 Key: SOLR-1865
>                 URL: https://issues.apache.org/jira/browse/SOLR-1865
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: SOLR-1865.patch
>
>
> If you create say a stopwords list with windows notepad or other editors and 
> save as UTF-8, 
> some of these editors will insert a byte-order marker (zero-width no-break 
> space) as the first 
> character of the file.
> http://www.lucidimagination.com/search/document/5101871231fc95af/is_this_a_bug_of_the_ressourceloader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to