Thank you, Chris and Thamme! I've downloaded the necessary models manually, and all is working...but it might annoy others.
-----Original Message----- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Tuesday, November 24, 2015 10:20 AM To: dev@tika.apache.org Cc: ThammeGowda Narayanaswamy <thammegowd...@usc.edu> Subject: Re: NER Parser tests behind proxy? Gotcha Tim, OK that helps. Thamme, can you try and test this behind a proxy so that we can try and replicate what Tim is seeing? As for packaging the models, Stanford NER may be difficult to do that, not only b/c of the license (GPLv3 [1], which is why we did it as a runtime dependency, and optional, since we also did Apache OpenNLP), but b/c of the size of the models. Apache OpenNLP models are there and freely available, but no Maven packaging exists for them. We’ll get this figured out Tim. Cheers, Chris [1] http://nlp.stanford.edu/software/corenlp.shtml ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: "Allison, Timothy B." <talli...@mitre.org> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> Date: Tuesday, November 24, 2015 at 6:07 AM To: "dev@tika.apache.org" <dev@tika.apache.org> Cc: ThammeGowda Narayanaswamy <thammegowd...@usc.edu> Subject: RE: NER Parser tests behind proxy? >Y, you do, but you (or I) can set the proxy for Maven correctly and >(without the NER requirement) the build works fine. > >***WARNING, what I'm running into might very well just be user error in >not telling Maven to pass the proxy info to Groovy...this is why I >didn't open an issue :) I've done some googling, but haven't found an >answer to >this.*** > >In response to Thamme's questions: >>> Which is better? >>> 1. List 'access to opennlp.sourceforge.net' as a requirement >I have access without a problem via regular means, the problem is that >Maven isn't passing proxy information into Groovy when it tries to make >the call to get the document (I confirmed this by dumping system props >within ModelGetter). Perhaps we just document that you need to >download the four model files manually and stick them in the right >subdirectory if you are behind a proxy (ugly solution, but would probably >work)? > > >>>2. Package and deploy models as a maven artifact >Are there licensing issues for the current models? Are the current >models ASLv2.0? Would we need all four full models? And, y, my >suggestion was to build a very small model and push it to source >control in the resources directory. > >All this said, 1) again, this could be user error and 2) the addition >of Stanford NER is fantastic...Thank you for this addition! > > >-----Original Message----- >From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] >Sent: Monday, November 23, 2015 11:12 AM >To: dev@tika.apache.org >Cc: ThammeGowda Narayanaswamy <thammegowd...@usc.edu> >Subject: Re: NER Parser tests behind proxy? > >Hey Tim, > >Why shouldn’t we have to worry >about connectivity outside of the Maven stuff? I mean clearly, if I >install Tika on a new system today without a Maven repo, I must be >connected to the internet, right? > >Cheers, >Chris > > > >-----Original Message----- >From: "Allison, Timothy B." <talli...@mitre.org> >Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> >Date: Monday, November 23, 2015 at 8:03 AM >To: "dev@tika.apache.org" <dev@tika.apache.org> >Cc: ThammeGowda Narayanaswamy <thammegowd...@usc.edu> >Subject: RE: NER Parser tests behind proxy? > >>The problem comes down to: ModelGetter.groovy which is trying to grab: >>${basedir}/src/test/resources/org/apache/tika/parser/ner/opennlp/ner-p >>e >>rso >>n.bin >> >>If we could build a small model (and I mean really small) and package >>it with Tika, we wouldn't have to worry about http connectivity >>outside of the usual maven stuff. >> >>-----Original Message----- >>From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] >>Sent: Monday, November 23, 2015 10:52 AM >>To: dev@tika.apache.org >>Cc: ThammeGowda Narayanaswamy <thammegowd...@usc.edu> >>Subject: Re: NER Parser tests behind proxy? >> >>Hey Tim, >> >>I’m not seeing these of course b/c I’m not behind a proxy. Thamme, any >>ideas? >> >>Cheers, >>Chris >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Chris Mattmann, Ph.D. >>Chief Architect >>Instrument Software and Science Data Systems Section (398) NASA Jet >>Propulsion Laboratory Pasadena, CA 91109 USA >>Office: 168-519, Mailstop: 168-527 >>Email: chris.a.mattm...@nasa.gov >>WWW: http://sunset.usc.edu/~mattmann/ >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Adjunct Associate Professor, Computer Science Department University of >>Southern California, Los Angeles, CA 90089 USA >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >>-----Original Message----- >>From: "Allison, Timothy B." <talli...@mitre.org> >>Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> >>Date: Thursday, November 19, 2015 at 5:36 PM >>To: "dev@tika.apache.org" <dev@tika.apache.org> >>Subject: NER Parser tests behind proxy? >> >>>My proxy is configured for git/maven/etc, but how do I configure it >>>within the test so that I don't get this? >>> >>>GET : http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin -> >>>tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ne >>>r >>>- >>>per >>>son.bin >>>[INFO] >>>--------------------------------------------------------------------- >>>- >>>- >>>- >>>[INFO] Reactor Summary: >>>[INFO] >>>[INFO] Apache Tika parent ................................ SUCCESS >>>[3.264s] [INFO] Apache Tika core .................................. >>>SUCCESS [44.470s] [INFO] Apache Tika parsers >>>............................... FAILURE [1:56.462s] [INFO] Apache >>>Tika XMP ................................... SKIPPED [INFO] Apache >>>Tika serialization ......................... SKIPPED [INFO] Apache >>>Tika batch ................................. SKIPPED [INFO] Apache >>>Tika application ........................... SKIPPED [INFO] Apache >>>Tika OSGi bundle ........................... SKIPPED [INFO] Apache >>>Tika translate ............................. SKIPPED [INFO] Apache >>>Tika server ................................ SKIPPED [INFO] Apache >>>Tika examples .............................. SKIPPED [INFO] Apache >>>Tika >>>Java-7 Components ..................... SKIPPED [INFO] Apache Tika >>>....................................... SKIPPED [INFO] >>>--------------------------------------------------------------------- >>>- >>>- >>>- >>>[INFO] BUILD FAILURE >>>[INFO] >>>--------------------------------------------------------------------- >>>- >>>- >>>- >>>[INFO] Total time: 2:45.245s >>>[INFO] Finished at: Thu Nov 19 20:29:34 EST 2015 [INFO] Final Memory: >>>52M/482M [INFO] >>>--------------------------------------------------------------------- >>>- >>>- >>>- >>>[ERROR] Failed to execute goal >>>org.codehaus.groovy.maven:gmaven-plugin:1.0:execute (testSetup) on >>>project tika-parsers: java.net.ConnectException: Connection refused: >>>connect -> [Help 1] >>>org.apache.maven.lifecycle.LifecycleExecutionException: Failed to >>>execute goal org.codehaus.groovy.maven:gmaven-plugin:1.0:execute >>>(testSetup) on project tika-parsers: java.net.ConnectException: >>>Connection refused: >>>connect >>> at >>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor. >>>j >>>ava >>>:217) >>> at >>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor. >>>j >>>ava >>>:153) >>> at >>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor. >>>j >>>ava >>>:145) >>> at >>>org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje >>>c >>>t >>>(Li >>>fecycleModuleBuilder.java:84) >>> at >>>org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje >>>c >>>t >>>(Li >>>fecycleModuleBuilder.java:59) >>> at >>>org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBu >>>i >>>l >>>d(L >>>ifecycleStarter.java:183) >>> at >>>org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl >>>e >>>S >>>tar >>>ter.java:161) >>> at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) >>> at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156) >>> at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) >>> at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) >>> at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >>>j >>>ava >>>: >>>62) >>> at >>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>>s >>>o >>>rIm >>>pl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at >>>org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Laun >>>c >>>her >>>. >>>java:290) >>> at >>>org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java: >>>230 >>>) >>> at >>>org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(La >>>u >>>n >>>che >>>r.java:409) >>> at >>>org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java: >>>352 >>>) >>> at org.codehaus.classworlds.Launcher.main(Launcher.java:47) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >>>j >>>ava >>>: >>>62) >>> at >>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>>s >>>o >>>rIm >>>pl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at >>>com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) >>>Caused by: org.apache.maven.plugin.MojoExecutionException: >>>java.net.ConnectException: Connection refused: connect >>> at >>>org.codehaus.groovy.maven.plugin.MojoSupport.execute(MojoSupport.java >>>:85 >>>) >>> at >>>org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(Default >>>B >>>u >>>ild >>>PluginManager.java:101) >>> at >>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor. >>>j >>>ava >>>:209) >>> ... 25 more >>>Caused by: org.codehaus.groovy.maven.feature.ComponentException: >>>java.net.ConnectException: Connection refused: connect >>> at >>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.invok >>>e >>>M >>>eth >>>od(ScriptExecutorSupport.java:162) >>> at >>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.execu >>>t >>>e >>>(Sc >>>riptExecutorSupport.java:126) >>> at >>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.execu >>>t >>>e >>>(Sc >>>riptExecutorSupport.java:73) >>> at >>>org.codehaus.groovy.maven.plugin.execute.ExecuteMojo.process(ExecuteM >>>o >>>j >>>o.j >>>ava:249) >>> at >>>org.codehaus.groovy.maven.plugin.ComponentMojoSupport.doExecute(Compo >>>n >>>e >>>ntM >>>ojoSupport.java:60) >>> at >>>org.codehaus.groovy.maven.plugin.MojoSupport.execute(MojoSupport.java >>>:69 >>>) >>> ... 27 more >>>Caused by: java.net.ConnectException: Connection refused: connect >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>Method) >>> at >>>sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct >>>o >>>r >>>Acc >>>essorImpl.java:62) >>> at >>>sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC >>>o >>>n >>>str >>>uctorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:422) >>> at >>>sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection. >>>j >>>ava >>>: >>>1890) >>> at >>>sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection. >>>j >>>ava >>>: >>>1885) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpU >>>R >>>L >>>Con >>>nection.java:1884) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLCo >>>n >>>n >>>ect >>>ion.java:1457) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon >>>n >>>e >>>cti >>>on.java:1441) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >>>j >>>ava >>>: >>>62) >>> at >>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>>s >>>o >>>rIm >>>pl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at >>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMet >>>h >>>o >>>dSi >>>teNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229) >>> at >>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMeta >>>M >>>e >>>tho >>>dSite.java:52) >>> at >>>org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSi >>>t >>>e >>>Arr >>>ay.java:43) >>> at >>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCa >>>l >>>l >>>Sit >>>e.java:116) >>> at >>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCa >>>l >>>l >>>Sit >>>e.java:120) >>> at ModelGetter.downloadFile(ModelGetter.groovy:64) >>> at ModelGetter$downloadFile.callCurrent(Unknown Source) >>> at >>>org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent >>>( >>>C >>>all >>>SiteArray.java:47) >>> at >>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(Abs >>>t >>>r >>>act >>>CallSite.java:142) >>> at >>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(Abs >>>t >>>r >>>act >>>CallSite.java:154) >>> at ModelGetter.run(ModelGetter.groovy:91) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >>>j >>>ava >>>: >>>62) >>> at >>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>>s >>>o >>>rIm >>>pl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at >>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.invok >>>e >>>M >>>eth >>>od(ScriptExecutorSupport.java:158) >>> ... 32 more >>>Caused by: java.net.ConnectException: Connection refused: connect >>> at java.net.DualStackPlainSocketImpl.connect0(Native Method) >>> at >>>java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketI >>>m >>>p >>>l.j >>>ava:79) >>> at >>>java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.ja >>>v >>>a >>>:35 >>>0) >>> at >>>java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocket >>>I >>>mpl >>>. >>>java:206) >>> at >>>java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java: >>>188 >>>) >>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) >>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>> at java.net.Socket.connect(Socket.java:589) >>> at java.net.Socket.connect(Socket.java:538) >>> at sun.net.NetworkClient.doConnect(NetworkClient.java:180) >>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) >>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) >>> at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) >>> at sun.net.www.http.HttpClient.New(HttpClient.java:308) >>> at sun.net.www.http.HttpClient.New(HttpClient.java:326) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLC >>>o >>>n >>>nec >>>tion.java:1169) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConn >>>e >>>c >>>tio >>>n.java:1105) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConne >>>c >>>t >>>ion >>>.java:999) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection. >>>j >>>ava >>>:933) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLCo >>>n >>>n >>>ect >>>ion.java:1513) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon >>>n >>>e >>>cti >>>on.java:1441) >>> at >>>sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLCon >>>n >>>e >>>cti >>>on.java:2943) >>> at java.net.URLConnection.getHeaderFieldLong(URLConnection.java:629) >>> at java.net.URLConnection.getContentLengthLong(URLConnection.java:501) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >>>j >>>ava >>>: >>>62) >>> at >>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>>s >>>o >>>rIm >>>pl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at >>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMet >>>h >>>o >>>dSi >>>teNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229) >>> at >>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMeta >>>M >>>e >>>tho >>>dSite.java:52) >>> at >>>org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSi >>>t >>>e >>>Arr >>>ay.java:43) >>> at >>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCa >>>l >>>l >>>Sit >>>e.java:116) >>> at >>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCa >>>l >>>l >>>Sit >>>e.java:120) >>> at ModelGetter.downloadFile(ModelGetter.groovy:61) >>> ... 42 more >>> >>>-----Original Message----- >>>From: Nick Burch [mailto:apa...@gagravarr.org] >>>Sent: Thursday, November 19, 2015 7:41 PM >>>To: dev@tika.apache.org >>>Subject: Re: [DISCUSS] Moving to Git >>> >>>On Thu, 19 Nov 2015, Mattmann, Chris A (3980) wrote: >>>> I’ll be happy to update our docs and to write a wiki page on using >>>> Tika & Git that we can refer folks to. I think I’ve demonstrated >>>> documenting things on the Tika wiki :) >>> >>>Great stuff! Scribble something sensible down, and I can vote +1 to >>>the move, plus learn more about Git at the same time :) >>> >>>Nick >> >