[ 
https://issues.apache.org/jira/browse/SOLR-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371985#comment-15371985
 ] 

Steve Rowe commented on SOLR-9295:
----------------------------------

The following pattern added to the {{invalidPatterns}} array in the 
{{source-patterns}} groovy script included with the top-level ant target 
{{-validate-source-patterns}} successfully found the two files [~elyograg] 
found (after I checked out the commit just before his):

{{(~$/^\uFEFF/$) : 'UTF-8 byte order mark'}}

Here's the output:

{noformat}
-validate-source-patterns:
[source-patterns] UTF-8 byte order mark: solr/bin/solr.cmd
[source-patterns] UTF-8 byte order mark: 
solr/webapp/web/js/lib/jquery.blockUI.js

BUILD FAILED
/Users/sarowe/git/lucene-solr-3/build.xml:132: Found 2 violations in source 
files (UTF-8 byte order mark).
{noformat}

I tested that it only flags beginning-of-file occurrences by prepending junk to 
{{solr/bin/solr.cmd}} - {{od -c}} shows the BOM was still present, just not at 
the beginning - and {{-validate-source-patterns}} didn't complain.

Committing shortly.

> Remove Unicode BOM (U+FEFF) from text files in codebase
> -------------------------------------------------------
>
>                 Key: SOLR-9295
>                 URL: https://issues.apache.org/jira/browse/SOLR-9295
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: scripts and tools
>    Affects Versions: master (7.0)
>            Reporter: Shawn Heisey
>            Priority: Trivial
>             Fix For: 6.2, master (7.0)
>
>
> When starting Solr built from the master branch on Windows, this is what you 
> see:
> {noformat}
> C:\Users\elyograg\git\lucene-solr\solr>bin\solr start
> C:\Users\elyograg\git\lucene-solr\solr>@REM
> '@REM' is not recognized as an internal or external command,
> operable program or batch file.
> {noformat}
> The three extra characters, found at the very beginning of the solr.cmd 
> script, are a Unicode BOM, and are invisible to vim, notepad, and notepad++.  
> The problem does not exist in 6.1.0, but IS present in branch_6x and master.
> Using grep to find this character in the entire codebase, I found one other 
> relevant file with a BOM.  All others were binary (images, jars, git pack 
> files, etc):
> ./solr/webapp/web/js/lib/jquery.blockUI.js



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to