Re: tika-core, tika-parser
On Wednesday 08 February 2012 18:27:32 Ken Krugler wrote: On Feb 8, 2012, at 5:28am, Markus Jelsma wrote: On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just like any other dependency managed by Ivy. They end up being copied in runtime/local/plugins/parse-tika/ or put in the job in runtime/deploy/ My problem is that i am working on some code for Tika-parsers 1.1-SNAPSHOT that i need to use in Nutch. However, when i build tika-parsers and put it in Nutch' lib directory i still seem to be missing dependencies. Then trouble begins: I don't know anything about how Nutch handles jars in its lib directory, but this sounds like you have a raw jar (tika-parsers) without its pom.xml. So then Ivy (or Maven) doesn't know about the transitive dependencies on other jars, which are needed to implement the actual parsing support. You're right, that's exactly what happened. However, i wasn't completely aware of it. Thanks -- Ken Exception in thread main java.lang.NoClassDefFoundError: Could not initialize class org.apache.tika.parser.dwg.DWGParser at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at sun.misc.Service$LazyIterator.next(Service.java:271) at org.apache.nutch.parse.tika.TikaConfig.init(TikaConfig.java:149 ) at org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:2 11) at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:25 4) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162 ) at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) Nick told me to remove DWG from the org.apache.tika.parsers.Parsers config file, which i did. But then other dependency issues come and go. The more parsers i remove from the config file the better it goes, but then Tika won't build anymore because of failing tests. I asked this on the Nutch list because i wasn't sure anymore how Nutch deals with these its own deps, which you explained well. I'll give up for now :) On 8 February 2012 13:03, Markus Jelsma markus.jel...@openindex.io wrote: Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's something else. dependencies, dependencies, dependencies :( On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml? vi ew=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- *Lewis* -- Markus Jelsma - CTO - Openindex -- Ken Krugler http://www.scaleunlimited.com custom big data solutions training Hadoop, Cascading, Mahout Solr -- Markus Jelsma - CTO - Openindex
tika-core, tika-parser
Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks
Re: tika-core, tika-parser
Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?view=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- *Lewis*
Re: tika-core, tika-parser
The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?view=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- *Lewis* -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
Re: tika-core, tika-parser
Yes, it's listed there indeed! But where are the parser impls then? I'll check this out. I must be getting crazy or something! On Wednesday 08 February 2012 13:58:46 Lewis John Mcgibbney wrote: Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?view =markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- Markus Jelsma - CTO - Openindex
Re: tika-core, tika-parser
Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's something else. dependencies, dependencies, dependencies :( On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi ew=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- *Lewis* -- Markus Jelsma - CTO - Openindex
Re: tika-core, tika-parser
sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just like any other dependency managed by Ivy. They end up being copied in runtime/local/plugins/parse-tika/ or put in the job in runtime/deploy/ On 8 February 2012 13:03, Markus Jelsma markus.jel...@openindex.io wrote: Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's something else. dependencies, dependencies, dependencies :( On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi ew=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- *Lewis* -- Markus Jelsma - CTO - Openindex -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
Re: tika-core, tika-parser
On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just like any other dependency managed by Ivy. They end up being copied in runtime/local/plugins/parse-tika/ or put in the job in runtime/deploy/ My problem is that i am working on some code for Tika-parsers 1.1-SNAPSHOT that i need to use in Nutch. However, when i build tika-parsers and put it in Nutch' lib directory i still seem to be missing dependencies. Then trouble begins: Exception in thread main java.lang.NoClassDefFoundError: Could not initialize class org.apache.tika.parser.dwg.DWGParser at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at sun.misc.Service$LazyIterator.next(Service.java:271) at org.apache.nutch.parse.tika.TikaConfig.init(TikaConfig.java:149) at org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211) at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:254) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) Nick told me to remove DWG from the org.apache.tika.parsers.Parsers config file, which i did. But then other dependency issues come and go. The more parsers i remove from the config file the better it goes, but then Tika won't build anymore because of failing tests. I asked this on the Nutch list because i wasn't sure anymore how Nutch deals with these its own deps, which you explained well. I'll give up for now :) On 8 February 2012 13:03, Markus Jelsma markus.jel...@openindex.io wrote: Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's something else. dependencies, dependencies, dependencies :( On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi ew=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- *Lewis* -- Markus Jelsma - CTO - Openindex -- Markus Jelsma - CTO - Openindex
Re: tika-core, tika-parser
On Feb 8, 2012, at 5:28am, Markus Jelsma wrote: On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just like any other dependency managed by Ivy. They end up being copied in runtime/local/plugins/parse-tika/ or put in the job in runtime/deploy/ My problem is that i am working on some code for Tika-parsers 1.1-SNAPSHOT that i need to use in Nutch. However, when i build tika-parsers and put it in Nutch' lib directory i still seem to be missing dependencies. Then trouble begins: I don't know anything about how Nutch handles jars in its lib directory, but this sounds like you have a raw jar (tika-parsers) without its pom.xml. So then Ivy (or Maven) doesn't know about the transitive dependencies on other jars, which are needed to implement the actual parsing support. -- Ken Exception in thread main java.lang.NoClassDefFoundError: Could not initialize class org.apache.tika.parser.dwg.DWGParser at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at sun.misc.Service$LazyIterator.next(Service.java:271) at org.apache.nutch.parse.tika.TikaConfig.init(TikaConfig.java:149) at org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211) at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:254) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) Nick told me to remove DWG from the org.apache.tika.parsers.Parsers config file, which i did. But then other dependency issues come and go. The more parsers i remove from the config file the better it goes, but then Tika won't build anymore because of failing tests. I asked this on the Nutch list because i wasn't sure anymore how Nutch deals with these its own deps, which you explained well. I'll give up for now :) On 8 February 2012 13:03, Markus Jelsma markus.jel...@openindex.io wrote: Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's something else. dependencies, dependencies, dependencies :( On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi ew=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere. Well, i have no idea how we do it in Nutch, do you? Thanks -- *Lewis* -- Markus Jelsma - CTO - Openindex -- Markus Jelsma - CTO - Openindex -- Ken Krugler http://www.scaleunlimited.com custom big data solutions training Hadoop, Cascading, Mahout Solr