ottlinger commented on code in PR #395:
URL: https://github.com/apache/creadur-rat/pull/395#discussion_r1845249719
##########
apache-rat-core/src/main/java/org/apache/rat/analysis/TikaProcessor.java:
##########
@@ -133,6 +148,31 @@ public static String process(final Document document)
throws RatDocumentAnalysis
}
}
+ /**
+ * Determine the character set for the input stream. Input stream must
implement mark.
+ * @param stream the stream to check.
+ * @return the detected characterset or null if not detectable.
+ * @throws IOException on IO error.
+ */
+ private static Charset detectCharset(final InputStream stream) throws
IOException {
+ CharsetDetector encodingDetector = new CharsetDetector();
+ encodingDetector.setText(stream);
+ CharsetMatch charsetMatch = encodingDetector.detect();
+ if (charsetMatch != null) {
+ try {
+ return Charset.forName(charsetMatch.getName());
+ } catch (UnsupportedCharsetException e) {
+ // do nothing
Review Comment:
Should we at least log the exception here in order to debug "unexpected
behaviour" when RAT browses the user's file tree?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]