ottlinger commented on code in PR #233:
URL: https://github.com/apache/creadur-rat/pull/233#discussion_r1555621867


##########
apache-rat-core/src/main/java/org/apache/rat/analysis/HeaderCheckWorker.java:
##########
@@ -47,98 +53,103 @@ class HeaderCheckWorker {
 
     private final int numberOfRetainedHeaderLines;
     private final BufferedReader reader;
-    private final ILicense license;
+    private final Collection<ILicense> licenses;
     private final Document document;
 
-    private int headerLinesToRead;
-    private boolean finished = false;
+    /**
+     * Read the input and perform the header check.
+     *
+     * The number of lines indicates how many lines from the top of the file 
will be read for processing
+     * 
+     * @param reader The reader for the document.
+     * @param numberOfLines the number of lines to read from the header.
+     * @return The IHeaders instance for the header.  
+     * @throws IOException on input failure
+     */
+    public static IHeaders readHeader(BufferedReader reader, int 
numberOfLines) throws IOException {
+        final StringBuilder headers = new StringBuilder();
+        int headerLinesRead = 0;
+        String line;
+
+        while (headerLinesRead < numberOfLines && (line = reader.readLine()) 
!= null) {
+            headers.append(line).append("\n");
+        }
+        final String raw = headers.toString();
+        final String pruned = 
FullTextMatcher.prune(raw).toLowerCase(Locale.ENGLISH);

Review Comment:
   will this bring in problems with non-english stuff? I remember there is a 
bug in the bkclog that a chinese document is detected as binary .... just 
wondering



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@creadur.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to