[ 
https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095798#comment-14095798
 ] 

Uwe Schindler commented on TIKA-1387:
-------------------------------------

I think, for "messages" written in english language (like those written to 
logs), ENGLISH is "more" correct. But it does not really matter.

About the charsets:
I would define a constant in IOUtils {{public final Charset UTF_8 = 
Charset.forName("UTF-8");}} and then pass this to all methods that accept it 
(like Readers, String,...). This is also faster than a sychronized String 
lookup on every conversion, like done by the standard default charset or String 
charset parameter.

Java 7 has StandardCharsets.UTF_8 but we cannot use this at the moment. But its 
defined like the one I propose for IOUtils.

> Add forbidden-apis checker to TIKA build
> ----------------------------------------
>
>                 Key: TIKA-1387
>                 URL: https://issues.apache.org/jira/browse/TIKA-1387
>             Project: Tika
>          Issue Type: Improvement
>          Components: general
>            Reporter: Uwe Schindler
>            Assignee: Tyler Palsulich
>             Fix For: 1.7
>
>         Attachments: TIKA-1387.palsulich.080614.patch, TIKA-1387.patch, 
> TIKA-1387.patch, TIKA-1387.patch
>
>
> Lucene and many other projects already use the forbidden-apis checker to 
> prevent use of some broken classes/signatures from the JDK. These are 
> especially thing using default character sets or default locales. The 
> forbidden-api checker can also be used to explcitely disallow specific 
> methods, if they have security issues (e.g., creating XML parsers without 
> disabling external entity support).
> The attached patch adds the forbidden-api checker to the tika-parent pom file 
> with default configuration.
> Running it fails with many errors in TIKA core already:
> {noformat}
> [INFO] --- forbiddenapis:1.6.1:check (default) @ tika-core ---
> [INFO] Scanning for classes to check...
> [INFO] Reading bundled API signatures: jdk-unsafe
> [INFO] Reading bundled API signatures: jdk-deprecated
> [INFO] Loading classes to check...
> [INFO] Scanning for API signatures and dependencies...
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.language.LanguageProfilerBuilder 
> (LanguageProfilerBuilder.java:407)
> [ERROR] Forbidden method invocation: java.lang.String#toUpperCase() [Uses 
> default locale]
> [ERROR]   in org.apache.tika.io.FilenameUtils (FilenameUtils.java:68)
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:257)
> [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:395)
> [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:416)
> [ERROR] Forbidden method invocation: 
> java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:438)
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:532)
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:550)
> [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:588)
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:656)
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:782)
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:851)
> [ERROR] Forbidden method invocation: 
> java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:957)
> [ERROR] Forbidden method invocation: 
> java.io.OutputStreamWriter#<init>(java.io.OutputStream) [Uses default charset]
> [ERROR]   in org.apache.tika.io.IOUtils (IOUtils.java:1064)
> [ERROR] Forbidden method invocation: 
> java.io.OutputStreamWriter#<init>(java.io.OutputStream) [Uses default charset]
> [ERROR]   in org.apache.tika.sax.WriteOutContentHandler 
> (WriteOutContentHandler.java:93)
> [ERROR] Forbidden method invocation: 
> java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset]
> [ERROR]   in org.apache.tika.parser.external.ExternalParser 
> (ExternalParser.java:234)
> [ERROR] Forbidden method invocation: 
> java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset]
> [ERROR]   in org.apache.tika.parser.external.ExternalParser$3 
> (ExternalParser.java:294)
> [ERROR] Forbidden method invocation: 
> java.util.Calendar#getInstance(java.util.Locale) [Uses default locale or time 
> zone]
> [ERROR]   in org.apache.tika.utils.DateUtils (DateUtils.java:83)
> [ERROR] Forbidden method invocation: 
> java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default 
> locale]
> [ERROR]   in org.apache.tika.utils.DateUtils (DateUtils.java:91)
> [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
> default locale]
> [ERROR]   in org.apache.tika.detect.MagicDetector (MagicDetector.java:98)
> [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses 
> default charset]
> [ERROR]   in org.apache.tika.detect.MagicDetector (MagicDetector.java:100)
> [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses 
> default charset]
> [ERROR]   in org.apache.tika.detect.MagicDetector (MagicDetector.java:396)
> [ERROR] Forbidden method invocation: 
> java.io.OutputStreamWriter#<init>(java.io.OutputStream) [Uses default charset]
> [ERROR]   in org.apache.tika.sax.ToTextContentHandler 
> (ToTextContentHandler.java:60)
> [ERROR] Scanned 225 (and 356 related) class file(s) for forbidden API 
> invocations (in 0.42s), 23 error(s).
> {noformat}
> We should fix those problems.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to