[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006009#comment-15006009
 ] 

ASF GitHub Bot commented on TIKA-1791:
--

Github user asfgit closed the pull request at:

https://github.com/apache/tika/pull/63


> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006024#comment-15006024
 ] 

Hudson commented on TIKA-1791:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #885 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/885/])
TIKA-1791 Comments and logging (nick: 
[http://svn.apache.org/viewvc/tika/trunk/?view=rev=1714494])
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParser.java
TIKA-1791 GeoParser fix for models in a jar file, from Thamme Gowda N. This 
closes #63 from GitHub (nick: 
[http://svn.apache.org/viewvc/tika/trunk/?view=rev=1714492])
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParser.java
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParserConfig.java
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/NameEntityExtractor.java


> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
> Fix For: 1.12
>
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-15 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006116#comment-15006116
 ] 

Chris A. Mattmann commented on TIKA-1791:
-

thanks for committing this Nick burch and for the work Thamme!

> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
> Fix For: 1.12
>
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-14 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005618#comment-15005618
 ] 

Nick Burch commented on TIKA-1791:
--

Longer term, we want to move instance-specific config out of Parser Context 
objects, and onto constructors etc (including Config XML support).

The contract for the Parser objects is that they're used to do 
document-specific changes/configuration. While there are cases, like this, 
where they also get (ab)used for instance config (eg normally static paths), 
while they're on the Parser Context, we need to handle the cases of them 
changing

> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-14 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005623#comment-15005623
 ] 

Chris A. Mattmann commented on TIKA-1791:
-

Config XML in my mind is just one option for loading or setting properties - 
the ParseContext is supposed to be a more dynamic way of setting properties on 
a per parse invocation process. The problem with moving static paths into 
constructors is that we have too many places in Tika (starting with 
ServiceLoading as the most glaring) that except zero arg constructors so we 
can't simply move these types of properties into the constructors until we 
figure out something better later.

> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-11 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000462#comment-15000462
 ] 

Nick Burch commented on TIKA-1791:
--

Thanks for the explanation

Next question - what happens if two calls to {{GeoParser}} use different NER 
paths? eg
{code}
GeoParser parser = new GeoParser();

ParseContext context = new ParseContext();
GeoParserConfig config = new GeoParserConfig();
context.set(GeoParserConfig.class, config);

config.setNERModelPath("/usr/bin");
parser.parse(inputA, metadata, handler, context);

config.setNERModelPath("/usr/local/bin");
parser.parse(inputB, metadata, handler, context);
{code}

Same parser each time, but different paths on the config. At first glance, it 
looks like your code would cause parsing 2 to use the config from parsing 1?

> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-11 Thread Thamme Gowda N (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000975#comment-15000975
 ] 

Thamme Gowda N commented on TIKA-1791:
--

Thanks for pointing out the issue.
I didn't anticipate changes to configurations after the parser started to run. 

It's now handled in `intialize()`:
{code}
if (this.modelUrl != null && this.modelUrl.equals(modelUrl)) {
//previously initialized for the same URL
return;
}
{code}

If the Tika's environments are so dynamic (like files pointed by URLs are 
frequently updated/deleted), then probably states shouldn't be used. However, 
as you can see it's a tradeoff to performance. If this is the case, I can 
revert back to the older way.



> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-10 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998814#comment-14998814
 ] 

Nick Burch commented on TIKA-1791:
--

There seems to be quite a few changes in the patch, not just a simple String to 
URL swap. Would you be able to explain a bit more about why you needed to make 
the additional changes you did, and why you took the approach you did to 
refactor things for the change?

I'm also a little worried about the {{geoparser.initialize(context);}} lines in 
the test - does that mean the parser stops working for people who don't add 
this additional step? If so, it's a no-go as most people will probably be using 
it via one of the facades like {{AutoDetectParser}} or {{DefaultParser}} so 
won't know to do things like that. 

> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-10 Thread Thamme Gowda N (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999085#comment-14999085
 ] 

Thamme Gowda N commented on TIKA-1791:
--

Thanks for the feedback. 

* The fix for non-hierarchical URI is done by using URL instead of URI and path 
string. (Learned that we can have a URL to files inside ZIP archive, but not 
URI)

While I modified NER model loading code to make above change possible, I also 
happened to make these changes:

* The NER model was previously reloaded for every `parse()` call. It now reuses 
the model by making use of a state variable.
* The `isAvailable()` function was previously trying to launch an external 
process for every call to figureout availability of 'lucene-geo-gazeteer' 
command (it is invoked in `parse()`). This has been changed to use a state 
variable.
* The model is loaded on first call to `parse()` or `isAviable()` : via lazy 
intialization. My tests showed that it is backward compatible. 

UPDATE : 
Test case is now unaltered.  I was just trying to see if the test cases are 
passing different parse context. The lazy intialization of name extractor is 
gauranteed to work and thus shouldnt be breaking the existing usages. The 
{code} GeoParserConfig.setNERModelPath(String) {code} is also preserved for the 
users who are already using it to supply model path. However, 
{code}GeoParserConfig.getNERPath() {code} is swapped with URL getter.


> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997841#comment-14997841
 ] 

ASF GitHub Bot commented on TIKA-1791:
--

GitHub user thammegowda opened a pull request:

https://github.com/apache/tika/pull/63

TIKA-1791 fix : non hierarchical URI exception when NER model is inside jar 
file

Improvement : Model is loaded once and NameFinder is reused 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thammegowda/tika fix-1791

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/63.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #63


commit 4c02c9e94adfde2163a2e2b4fac9425e5485a583
Author: Thamme Gowda 
Date:   2015-11-10T01:39:44Z

TIKA-1791 fix : non hierarchical URI for NER model




> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-09 Thread Thamme Gowda N (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997843#comment-14997843
 ] 

Thamme Gowda N commented on TIKA-1791:
--

Resolved and pull request is created on GitHub :  
https://github.com/apache/tika/pull/63

> URI is not hierarchical exception when location model resource is inside a 
> jar in classpath
> ---
>
> Key: TIKA-1791
> URL: https://issues.apache.org/jira/browse/TIKA-1791
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
> Environment: location model  file is placed inside a fat Jar (with 
> all the dependencies)
>Reporter: Thamme Gowda N
>
> {code:title=Stacktrace|borderStyle=solid}
> The following error happens when location NER model resource is packaged 
> inside a jar and GeoTopicParser is enabled.
> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>   at java.io.File.(File.java:418)
>   at 
> org.apache.tika.parser.geo.topic.GeoParserConfig.(GeoParserConfig.java:33)
>   at org.apache.tika.parser.geo.topic.GeoParser.(GeoParser.java:54)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:559)
>   at 
> org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:492)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:166)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:149)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:142)
>   at org.apache.tika.config.TikaConfig.(TikaConfig.java:138)
>   at edu.usc.cs.ir.cwork.tika.Parser.(Parser.java:45)
> {code}
> Refernces :
> http://stackoverflow.com/questions/18055189/why-my-uri-is-not-hierarchical



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)