[ https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999085#comment-14999085 ]
Thamme Gowda N commented on TIKA-1791: -------------------------------------- Thanks for the feedback. * The fix for non-hierarchical URI is done by using URL instead of URI and path string. (Learned that we can have a URL to files inside ZIP archive, but not URI) While I modified NER model loading code to make above change possible, I also happened to make these changes: * The NER model was previously reloaded for every `parse()` call. It now reuses the model by making use of a state variable. * The `isAvailable()` function was previously trying to launch an external process for every call to figureout availability of 'lucene-geo-gazeteer' command (it is invoked in `parse()`). This has been changed to use a state variable. * The model is loaded on first call to `parse()` or `isAviable()` : via lazy intialization. My tests showed that it is backward compatible. UPDATE : Test case is now unaltered. I was just trying to see if the test cases are passing different parse context. The lazy intialization of name extractor is gauranteed to work and thus shouldnt be breaking the existing usages. The {code} GeoParserConfig.setNERModelPath(String) {code} is also preserved for the users who are already using it to supply model path. However, {code}GeoParserConfig.getNERPath() {code} is swapped with URL getter. > URI is not hierarchical exception when location model resource is inside a > jar in classpath > ------------------------------------------------------------------------------------------- > > Key: TIKA-1791 > URL: https://issues.apache.org/jira/browse/TIKA-1791 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.11 > Environment: location model file is placed inside a fat Jar (with > all the dependencies) > Reporter: Thamme Gowda N > > {code:title=Stacktrace|borderStyle=solid} > The following error happens when location NER model resource is packaged > inside a jar and GeoTopicParser is enabled. > Caused by: java.lang.IllegalArgumentException: URI is not hierarchical > at java.io.File.<init>(File.java:418) > at > org.apache.tika.parser.geo.topic.GeoParserConfig.<init>(GeoParserConfig.java:33) > at org.apache.tika.parser.geo.topic.GeoParser.<init>(GeoParser.java:54) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at java.lang.Class.newInstance(Class.java:442) > at > org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559) > at > org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492) > at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:166) > at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:149) > at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:142) > at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:138) > at edu.usc.cs.ir.cwork.tika.Parser.<init>(Parser.java:45) > {code} > Refernces : > http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical -- This message was sent by Atlassian JIRA (v6.3.4#6332)