[
https://issues.apache.org/jira/browse/RAT-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898881#comment-17898881
]
Claude Warren commented on RAT-81:
----------------------------------
I duplicated the issue in an integration test in V0.17 and applied a fix. We
now properly detect the encoding (in most cases) and will read the file with
the proper encoding so they are not detected as binary and are not marked as
not having a license provided we can get the encoding and they have a valid
license in them.
> MalformedInputException thrown when RAT tries reading file
> ----------------------------------------------------------
>
> Key: RAT-81
> URL: https://issues.apache.org/jira/browse/RAT-81
> Project: Apache Rat
> Issue Type: Bug
> Components: core engine
> Affects Versions: 0.6, 0.7, 0.11
> Environment: Linux (Ubuntu) on x86, running with "default" file
> encoding set to UTF-8
> Reporter: Marshall Schor
> Assignee: Claude Warren
> Priority: Minor
> Fix For: 0.17
>
>
> To reproduce, set the platform default locale to something that indicates
> UTF-8 file encoding.
> This causes code in (for example) org.apache.rat.document.impl.FileDocument
> which return FileReader to set up RAT to use a reader which is using the
> platform default character encoding (in this case UTF-8).
> If the file being processed is not encoded in this , it is possible that the
> reader will read some data which is "invalid" UTF-8 encodings, which causes
> the reader to throw a MalformedInputException error.
> One case we found:
> The file being examined had invalid UTF-8 encodings. First, Rat ran the
> BinaryGuesser - but that returned false because it attempted to read the
> first 100 or so chars, and got a "MalformedInputException" instead, so the
> try/catch block just ended up returning "false" (not binary). Then the
> HeaderChecker tried to read the file to check the header, and got this same
> exception - but this time, it made RAT fail.
> Here's the last part of the stack trace:
> Caused by: org.apache.rat.report.RatReportFailedException: Analysis failed
> at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:66)
> at org.apache.rat.mp.FilesReportable.run(FilesReportable.java:69)
> at org.apache.rat.Report.report(Report.java:292)
> at org.apache.rat.Report.report(Report.java:272)
> at
> org.apache.rat.mp.AbstractRatMojo.createReport(AbstractRatMojo.java:341)
> ... 23 more
> Caused by: org.apache.rat.document.RatDocumentAnalysisException: Cannot
> analyse header
> at
> org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:54)
> at
> org.apache.rat.document.impl.util.DocumentAnalyserMultiplexer.analyse(DocumentAnalyserMultiplexer.java:37)
> at
> org.apache.rat.document.impl.util.ConditionalAnalyser.matches(ConditionalAnalyser.java:44)
> at
> org.apache.rat.document.impl.util.ConditionalAnalyser.analyse(ConditionalAnalyser.java:50)
> at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:64)
> ... 27 more
> Caused by: org.apache.rat.analysis.RatHeaderAnalysisException: Cannot read
> header for
> /home/tgoetz/tmp/uimaj-2.3.1/uimaj-core/src/test/resources/pearTests/encodingTests/UTF16_with_signature.xml
> at
> org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:96)
> at
> org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:50)
> ... 31 more
> Caused by: sun.io.MalformedInputException
> at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:294)
> at
> sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:316)
> at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:366)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:252)
> at java.io.InputStreamReader.read(InputStreamReader.java:212)
> at java.io.BufferedReader.fill(BufferedReader.java:157)
> at java.io.BufferedReader.readLine(BufferedReader.java:320)
> at java.io.BufferedReader.readLine(BufferedReader.java:383)
> at
> org.apache.rat.report.analyser.HeaderCheckWorker.readLine(HeaderCheckWorker.java:111)
> at
> org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:89)
> ... 32 more
> Work-around: mark these files for explicit exclusion.
> Fix: change the binaryguesser to read the files in binary (not assuming any
> character coding) and operate with that data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)