[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999085#comment-14999085
 ] 

Thamme Gowda N commented on TIKA-1791:
--------------------------------------

Thanks for the feedback. 

* The fix for non-hierarchical URI is done by using URL instead of URI and path 
string. (Learned that we can have a URL to files inside ZIP archive, but not 
URI)

While I modified NER model loading code to make above change possible, I also 
happened to make these changes:

* The NER model was previously reloaded for every `parse()` call. It now reuses 
the model by making use of a state variable.
* The `isAvailable()` function was previously trying to launch an external 
process for every call to figureout availability of 'lucene-geo-gazeteer' 
command (it is invoked in `parse()`). This has been changed to use a state 
variable.
* The model is loaded on first call to `parse()` or `isAviable()` : via lazy 
intialization. My tests showed that it is backward compatible. 

UPDATE : 
Test case is now unaltered.  I was just trying to see if the test cases are 
passing different parse context. The lazy intialization of name extractor is 
gauranteed to work and thus shouldnt be breaking the existing usages. The 
{code} GeoParserConfig.setNERModelPath(String) {code} is also preserved for the 
users who are already using it to supply model path. However, 
{code}GeoParserConfig.getNERPath() {code} is swapped with URL getter.


> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1791
>                 URL: https://issues.apache.org/jira/browse/TIKA-1791
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11
>         Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>            Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>       at java.io.File.<init>(File.java:418)
>       at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.<init>(GeoParserConfig.java:33)
>       at org.apache.tika.parser.geo.topic.GeoParser.<init>(GeoParser.java:54)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>       at java.lang.Class.newInstance(Class.java:442)
>       at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>       at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>       at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:166)
>       at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:149)
>       at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:142)
>       at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:138)
>       at edu.usc.cs.ir.cwork.tika.Parser.<init>(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to