[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287824#comment-14287824 ]
Uwe Schindler commented on TIKA-1526: ------------------------------------- Tim: Linux does not use posis spawn. You ned MacOSX or Solaris. Oracle has a completely different implementation for spawning processes in Linux. > ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so > Turkish Tika users can still use non-external parsers > -------------------------------------------------------------------------------------------------------------------------------- > > Key: TIKA-1526 > URL: https://issues.apache.org/jira/browse/TIKA-1526 > Project: Tika > Issue Type: Wish > Reporter: Hoss Man > > the JDK has numerous pain points regarding the Turkish locale, "posix_spawn" > lowercasing being one of them... > https://bugs.openjdk.java.net/browse/JDK-8047340 > https://bugs.openjdk.java.net/browse/JDK-8055301 > As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is > enabled & configured by default in Tika, and uses ExternalParser.check to see > if tesseract is available -- but because of the JDK bug, this means that Tika > fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like > so... > {noformat} > [junit4] > Throwable #1: java.lang.Error: posix_spawn is not a supported > process launch mechanism on this platform. > [junit4] > at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) > [junit4] > at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) > [junit4] > at java.security.AccessController.doPrivileged(Native > Method) > [junit4] > at java.lang.UNIXProcess.<clinit>(UNIXProcess.java:92) > [junit4] > at java.lang.ProcessImpl.start(ProcessImpl.java:130) > [junit4] > at > java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > [junit4] > at java.lang.Runtime.exec(Runtime.java:620) > [junit4] > at java.lang.Runtime.exec(Runtime.java:485) > [junit4] > at > org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) > [junit4] > at > org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) > [junit4] > at > org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) > [junit4] > at > org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) > [junit4] > at > org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) > [junit4] > at > org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) > [junit4] > at > org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) > [junit4] > at > org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) > [junit4] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) > [junit4] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > {noformat} > ...unless they go out of their way to white list only the parsers they > need/want so TesseractOCRParser (and any other ExternalParsers) will never > even be check()ed. > It would be nice if Tika's ExternalParser class added a similar > hack/workarround to what was done in SOLR-6387 to trap these types of errors. > In Solr we just propogate a better error explaining why Java hates the > turkish langauge... > {code} > } catch (Error err) { > if (err.getMessage() != null && (err.getMessage().contains("posix_spawn") > || err.getMessage().contains("UNIXProcess"))) { > log.warn("Error forking command due to JVM locale bug (see > https://issues.apache.org/jira/browse/SOLR-6387): " + err.getMessage()); > return "(error executing: " + cmd + ")"; > } > } > {code} > ...but with Tika, it might be better for all ExternalParsers to just "opt > out" as if they don't recognize the filetype when they detect this type of > error fro m the check method (or perhaps it would be better if > AutoDetectParser handled this? ... i'm not really sure how it would best fit > into Tika's architecture) -- This message was sent by Atlassian JIRA (v6.3.4#6332)