[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288963#comment-14288963
 ] 

Uwe Schindler commented on TIKA-1526:
-------------------------------------

I tried it with maven, but this is all too funny. This bug also affects Maven...

{noformat}
[uschindler@lucene ~]$ export MAVEN_OPTS=-Duser.language=tr
[uschindler@lucene ~]$ mvn
---------------------------------------------------
constituent[0]: 
file:/usr/local/share/java/maven3/lib/aether-connector-wagon-1.13.1.jar
constituent[1]: 
file:/usr/local/share/java/maven3/lib/maven-repository-metadata-3.0.4.jar
constituent[2]: 
file:/usr/local/share/java/maven3/lib/plexus-sec-dispatcher-1.3.jar
constituent[3]: file:/usr/local/share/java/maven3/lib/aether-spi-1.13.1.jar
constituent[4]: file:/usr/local/share/java/maven3/lib/maven-compat-3.0.4.jar
constituent[5]: 
file:/usr/local/share/java/maven3/lib/plexus-component-annotations-1.5.5.jar
constituent[6]: file:/usr/local/share/java/maven3/lib/plexus-cipher-1.7.jar
constituent[7]: file:/usr/local/share/java/maven3/lib/sisu-guava-0.9.9.jar
constituent[8]: file:/usr/local/share/java/maven3/lib/maven-core-3.0.4.jar
constituent[9]: file:/usr/local/share/java/maven3/lib/plexus-utils-2.0.6.jar
constituent[10]: 
file:/usr/local/share/java/maven3/lib/wagon-provider-api-2.2.jar
constituent[11]: 
file:/usr/local/share/java/maven3/lib/maven-plugin-api-3.0.4.jar
constituent[12]: 
file:/usr/local/share/java/maven3/lib/maven-model-builder-3.0.4.jar
constituent[13]: file:/usr/local/share/java/maven3/lib/maven-settings-3.0.4.jar
constituent[14]: 
file:/usr/local/share/java/maven3/lib/sisu-inject-bean-2.3.0.jar
constituent[15]: file:/usr/local/share/java/maven3/lib/wagon-http-2.2-shaded.jar
constituent[16]: 
file:/usr/local/share/java/maven3/lib/maven-aether-provider-3.0.4.jar
constituent[17]: 
file:/usr/local/share/java/maven3/lib/sisu-inject-plexus-2.3.0.jar
constituent[18]: file:/usr/local/share/java/maven3/lib/maven-artifact-3.0.4.jar
constituent[19]: file:/usr/local/share/java/maven3/lib/maven-model-3.0.4.jar
constituent[20]: file:/usr/local/share/java/maven3/lib/wagon-file-2.2.jar
constituent[21]: file:/usr/local/share/java/maven3/lib/maven-embedder-3.0.4.jar
constituent[22]: 
file:/usr/local/share/java/maven3/lib/sisu-guice-3.1.0-no_aop.jar
constituent[23]: 
file:/usr/local/share/java/maven3/lib/maven-settings-builder-3.0.4.jar
constituent[24]: 
file:/usr/local/share/java/maven3/lib/plexus-interpolation-1.14.jar
constituent[25]: file:/usr/local/share/java/maven3/lib/aether-impl-1.13.1.jar
constituent[26]: file:/usr/local/share/java/maven3/lib/aether-api-1.13.1.jar
constituent[27]: file:/usr/local/share/java/maven3/lib/aether-util-1.13.1.jar
constituent[28]: file:/usr/local/share/java/maven3/lib/commons-cli-1.2.jar
---------------------------------------------------
Exception in thread "main" java.lang.Error: posix_spawn is not a supported 
process launch mechanism on this platform.
        at java.lang.UNIXProcess$1.run(UNIXProcess.java:111)
        at java.lang.UNIXProcess$1.run(UNIXProcess.java:93)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.lang.UNIXProcess.<clinit>(UNIXProcess.java:91)
        at java.lang.ProcessImpl.start(ProcessImpl.java:130)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
        at java.lang.Runtime.exec(Runtime.java:617)
        at java.lang.Runtime.exec(Runtime.java:450)
        at java.lang.Runtime.exec(Runtime.java:347)
        at 
org.codehaus.plexus.interpolation.os.OperatingSystemUtils.getSystemEnvVars(OperatingSystemUtils.java:86)
        at 
org.codehaus.plexus.interpolation.EnvarBasedValueSource.getEnvars(EnvarBasedValueSource.java:74)
        at 
org.codehaus.plexus.interpolation.EnvarBasedValueSource.<init>(EnvarBasedValueSource.java:64)
        at 
org.codehaus.plexus.interpolation.EnvarBasedValueSource.<init>(EnvarBasedValueSource.java:50)
        at 
org.apache.maven.settings.building.DefaultSettingsBuilder.interpolate(DefaultSettingsBuilder.java:222)
        at 
org.apache.maven.settings.building.DefaultSettingsBuilder.build(DefaultSettingsBuilder.java:101)
        at org.apache.maven.cli.MavenCli.settings(MavenCli.java:725)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:193)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
        at org.codehaus.classworlds.Launcher.main(Launcher.java:47)
{noformat}

So you are not able to run Maven for the same reasons on Turkish Locale :-) 
This bug is insane!

> ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so 
> Turkish Tika users can still use non-external parsers
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1526
>                 URL: https://issues.apache.org/jira/browse/TIKA-1526
>             Project: Tika
>          Issue Type: Wish
>            Reporter: Hoss Man
>
> the JDK has numerous pain points regarding the Turkish locale, "posix_spawn" 
> lowercasing being one of them...
> https://bugs.openjdk.java.net/browse/JDK-8047340
> https://bugs.openjdk.java.net/browse/JDK-8055301
> As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
> enabled & configured by default in Tika, and uses ExternalParser.check to see 
> if tesseract is available -- but because of the JDK bug, this means that Tika 
> fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
> so...
> {noformat}
>   [junit4]    > Throwable #1: java.lang.Error: posix_spawn is not a supported 
> process launch mechanism on this platform.
>   [junit4]    >       at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
>   [junit4]    >       at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
>   [junit4]    >       at java.security.AccessController.doPrivileged(Native 
> Method)
>   [junit4]    >       at java.lang.UNIXProcess.<clinit>(UNIXProcess.java:92)
>   [junit4]    >       at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>   [junit4]    >       at 
> java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>   [junit4]    >       at java.lang.Runtime.exec(Runtime.java:620)
>   [junit4]    >       at java.lang.Runtime.exec(Runtime.java:485)
>   [junit4]    >       at 
> org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
>   [junit4]    >       at 
> org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
>   [junit4]    >       at 
> org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
>   [junit4]    >       at 
> org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
>   [junit4]    >       at 
> org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
>   [junit4]    >       at 
> org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
>   [junit4]    >       at 
> org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
>   [junit4]    >       at 
> org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
>   [junit4]    >       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
>   [junit4]    >       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> {noformat}
> ...unless they go out of their way to white list only the parsers they 
> need/want so TesseractOCRParser (and any other ExternalParsers) will never 
> even be check()ed.
> It would be nice if Tika's ExternalParser class added a similar 
> hack/workarround to what was done in SOLR-6387 to trap these types of errors. 
>  In Solr we just propogate a better error explaining why Java hates the 
> turkish langauge...
> {code}
> } catch (Error err) {
>   if (err.getMessage() != null && (err.getMessage().contains("posix_spawn") 
> || err.getMessage().contains("UNIXProcess"))) {
>     log.warn("Error forking command due to JVM locale bug (see 
> https://issues.apache.org/jira/browse/SOLR-6387): " + err.getMessage());
>     return "(error executing: " + cmd + ")";
>   }
> }
> {code}
> ...but with Tika, it might be better for all ExternalParsers to just "opt 
> out" as if they don't recognize the filetype when they detect this type of 
> error fro m the check method (or perhaps it would be better if 
> AutoDetectParser handled this? ... i'm not really sure how it would best fit 
> into Tika's architecture)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to