MalformedInputException thrown when RAT tries reading file
----------------------------------------------------------
Key: RAT-81
URL: https://issues.apache.org/jira/browse/RAT-81
Project: RAT
Issue Type: Bug
Affects Versions: 0.6
Environment: Linux (Ubuntu) on x86, running with "default" file
encoding set to UTF-8
Reporter: Marshall Schor
Priority: Minor
To reproduce, set the platform default locale to something that indicates UTF-8
file encoding.
This causes code in (for example) org.apache.rat.document.impl.FileDocument
which return FileReader to set up RAT to use a reader which is using the
platform default character encoding (in this case UTF-8).
If the file being processed is not encoded in this , it is possible that the
reader will read some data which is "invalid" UTF-8 encodings, which causes the
reader to throw a MalformedInputException error.
One case we found:
The file being examined had invalid UTF-8 encodings. First, Rat ran the
BinaryGuesser - but that returned false because it attempted to read the first
100 or so chars, and got a "MalformedInputException" instead, so the try/catch
block just ended up returning "false" (not binary). Then the HeaderChecker
tried to read the file to check the header, and got this same exception - but
this time, it made RAT fail.
Here's the last part of the stack trace:
Caused by: org.apache.rat.report.RatReportFailedException: Analysis failed
at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:66)
at org.apache.rat.mp.FilesReportable.run(FilesReportable.java:69)
at org.apache.rat.Report.report(Report.java:292)
at org.apache.rat.Report.report(Report.java:272)
at org.apache.rat.mp.AbstractRatMojo.createReport(AbstractRatMojo.java:341)
... 23 more
Caused by: org.apache.rat.document.RatDocumentAnalysisException: Cannot analyse
header
at
org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:54)
at
org.apache.rat.document.impl.util.DocumentAnalyserMultiplexer.analyse(DocumentAnalyserMultiplexer.java:37)
at
org.apache.rat.document.impl.util.ConditionalAnalyser.matches(ConditionalAnalyser.java:44)
at
org.apache.rat.document.impl.util.ConditionalAnalyser.analyse(ConditionalAnalyser.java:50)
at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:64)
... 27 more
Caused by: org.apache.rat.analysis.RatHeaderAnalysisException: Cannot read
header for
/home/tgoetz/tmp/uimaj-2.3.1/uimaj-core/src/test/resources/pearTests/encodingTests/UTF16_with_signature.xml
at
org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:96)
at
org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:50)
... 31 more
Caused by: sun.io.MalformedInputException
at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:294)
at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:316)
at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:366)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:252)
at java.io.InputStreamReader.read(InputStreamReader.java:212)
at java.io.BufferedReader.fill(BufferedReader.java:157)
at java.io.BufferedReader.readLine(BufferedReader.java:320)
at java.io.BufferedReader.readLine(BufferedReader.java:383)
at
org.apache.rat.report.analyser.HeaderCheckWorker.readLine(HeaderCheckWorker.java:111)
at
org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:89)
... 32 more
Work-around: mark these files for explicit exclusion.
Fix: change the binaryguesser to read the files in binary (not assuming any
character coding) and operate with that data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.