Re: tika-core, tika-parser
On Wednesday 08 February 2012 18:27:32 Ken Krugler wrote: > On Feb 8, 2012, at 5:28am, Markus Jelsma wrote: > > On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: > >> sorry don't understand what your issue is. We have a dependency on > >> tika-parsers and the actual parser implementations (listed in tika > >> parsers' POM) are pulled transitively just like any other dependency > >> managed by Ivy. They end up being copied in > >> runtime/local/plugins/parse-tika/ or put in the job in runtime/deploy/ > > > > My problem is that i am working on some code for Tika-parsers > > 1.1-SNAPSHOT that i need to use in Nutch. However, when i build > > tika-parsers and put it in Nutch' lib directory i still seem to be > > missing dependencies. Then trouble > > > begins: > I don't know anything about how Nutch handles jars in its lib directory, > but this sounds like you have a "raw" jar (tika-parsers) without its > pom.xml. > > So then Ivy (or Maven) doesn't know about the transitive dependencies on > other jars, which are needed to implement the actual parsing support. You're right, that's exactly what happened. However, i wasn't completely aware of it. Thanks > > -- Ken > > > Exception in thread "main" java.lang.NoClassDefFoundError: Could not > > initialize class org.apache.tika.parser.dwg.DWGParser > > > >at java.lang.Class.forName0(Native Method) > >at java.lang.Class.forName(Class.java:247) > >at sun.misc.Service$LazyIterator.next(Service.java:271) > >at > >org.apache.nutch.parse.tika.TikaConfig.(TikaConfig.java:149 > >) at > > > > org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:2 > > 11) > > > >at > >org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:25 > >4) at > > > > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162 > > ) > > > >at > > > > org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) > > > >at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) > >at > >org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) > >at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at > >org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) > > > > Nick told me to remove DWG from the org.apache.tika.parsers.Parsers > > config file, which i did. But then other dependency issues come and go. > > The more parsers i remove from the config file the better it goes, but > > then Tika won't build anymore because of failing tests. > > > > I asked this on the Nutch list because i wasn't sure anymore how Nutch > > deals with these its own deps, which you explained well. > > > > I'll give up for now :) > > > >> On 8 February 2012 13:03, Markus Jelsma wrote: > >>> Yes, it looks like it! It should also be upgraded to Tika 1.0. But > >>> that's something else. > >>> > >>> dependencies, dependencies, dependencies :( > >>> > >>> On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: > The dependencies for the plugins are defined locally as shown in the > URL below, where you can see the ref to tika-parsers for parse-tika. > Is that more clear for you Markus? > > On 8 February 2012 12:58, Lewis John Mcgibbney > >>> > >>> wrote: > > Hi Markus, > > > > For starters > >>> > >>> http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml? > >>> vi > >>> > > ew=markup > > > > Can we pick our way through this? > > > > Thanks > > > > > > On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > > > > >> wrote: > >> Hi, > >> > >> Can anyone shed light on this? We don't have any parsers in our libs > >>> > >>> dir > >>> > >> and > >> we don't have tika-parsers jar, only the tika-core jar. Where are > >> the parsers > >> and how does this all work? > >> > >> I've posted a question (same subject) on the Tika list and Nick > >> tells > >>> > >>> me > >>> > >> there > >> must be parsers somewhere. Well, i have no idea how we do it in > >> Nutch, do you? > >> > >> Thanks > > > > -- > > *Lewis* > >>> > >>> -- > >>> Markus Jelsma - CTO - Openindex > > -- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr -- Markus Jelsma - CTO - Openindex
Re: tika-core, tika-parser
On Feb 8, 2012, at 5:28am, Markus Jelsma wrote: > > > On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: >> sorry don't understand what your issue is. We have a dependency on >> tika-parsers and the actual parser implementations (listed in tika parsers' >> POM) are pulled transitively just like any other dependency managed by Ivy. >> They end up being copied in runtime/local/plugins/parse-tika/ or put in >> the job in runtime/deploy/ > > My problem is that i am working on some code for Tika-parsers 1.1-SNAPSHOT > that i need to use in Nutch. However, when i build tika-parsers and put it in > Nutch' lib directory i still seem to be missing dependencies. Then trouble > begins: I don't know anything about how Nutch handles jars in its lib directory, but this sounds like you have a "raw" jar (tika-parsers) without its pom.xml. So then Ivy (or Maven) doesn't know about the transitive dependencies on other jars, which are needed to implement the actual parsing support. -- Ken > > Exception in thread "main" java.lang.NoClassDefFoundError: Could not > initialize class org.apache.tika.parser.dwg.DWGParser >at java.lang.Class.forName0(Native Method) >at java.lang.Class.forName(Class.java:247) >at sun.misc.Service$LazyIterator.next(Service.java:271) >at org.apache.nutch.parse.tika.TikaConfig.(TikaConfig.java:149) >at > org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211) >at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:254) >at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) >at > org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) >at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) >at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) >at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) > > Nick told me to remove DWG from the org.apache.tika.parsers.Parsers config > file, which i did. But then other dependency issues come and go. The more > parsers i remove from the config file the better it goes, but then Tika won't > build anymore because of failing tests. > > I asked this on the Nutch list because i wasn't sure anymore how Nutch deals > with these its own deps, which you explained well. > > I'll give up for now :) > > > >> >> On 8 February 2012 13:03, Markus Jelsma wrote: >>> Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's >>> something else. >>> >>> dependencies, dependencies, dependencies :( >>> >>> On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney >>> >>> wrote: > Hi Markus, > > For starters >>> >>> http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi >>> > ew=markup > > Can we pick our way through this? > > Thanks > > > On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > >> wrote: >> Hi, >> >> Can anyone shed light on this? We don't have any parsers in our libs >>> >>> dir >>> >> and >> we don't have tika-parsers jar, only the tika-core jar. Where are >> the parsers >> and how does this all work? >> >> I've posted a question (same subject) on the Tika list and Nick >> tells >>> >>> me >>> >> there >> must be parsers somewhere. Well, i have no idea how we do it in >> Nutch, do you? >> >> Thanks > > -- > *Lewis* >>> >>> -- >>> Markus Jelsma - CTO - Openindex > > -- > Markus Jelsma - CTO - Openindex -- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr
Re: tika-core, tika-parser
On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: > sorry don't understand what your issue is. We have a dependency on > tika-parsers and the actual parser implementations (listed in tika parsers' > POM) are pulled transitively just like any other dependency managed by Ivy. > They end up being copied in runtime/local/plugins/parse-tika/ or put in > the job in runtime/deploy/ My problem is that i am working on some code for Tika-parsers 1.1-SNAPSHOT that i need to use in Nutch. However, when i build tika-parsers and put it in Nutch' lib directory i still seem to be missing dependencies. Then trouble begins: Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.tika.parser.dwg.DWGParser at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at sun.misc.Service$LazyIterator.next(Service.java:271) at org.apache.nutch.parse.tika.TikaConfig.(TikaConfig.java:149) at org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211) at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:254) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) Nick told me to remove DWG from the org.apache.tika.parsers.Parsers config file, which i did. But then other dependency issues come and go. The more parsers i remove from the config file the better it goes, but then Tika won't build anymore because of failing tests. I asked this on the Nutch list because i wasn't sure anymore how Nutch deals with these its own deps, which you explained well. I'll give up for now :) > > On 8 February 2012 13:03, Markus Jelsma wrote: > > Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's > > something else. > > > > dependencies, dependencies, dependencies :( > > > > On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: > > > The dependencies for the plugins are defined locally as shown in the > > > URL below, where you can see the ref to tika-parsers for parse-tika. > > > Is that more clear for you Markus? > > > > > > On 8 February 2012 12:58, Lewis John Mcgibbney > > > > wrote: > > > > Hi Markus, > > > > > > > > For starters > > > > http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi > > > > > > ew=markup > > > > > > > > Can we pick our way through this? > > > > > > > > Thanks > > > > > > > > > > > > On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > > > > > > > > > > > > wrote: > > > >> Hi, > > > >> > > > >> Can anyone shed light on this? We don't have any parsers in our libs > > > > dir > > > > > >> and > > > >> we don't have tika-parsers jar, only the tika-core jar. Where are > > > >> the parsers > > > >> and how does this all work? > > > >> > > > >> I've posted a question (same subject) on the Tika list and Nick > > > >> tells > > > > me > > > > > >> there > > > >> must be parsers somewhere. Well, i have no idea how we do it in > > > >> Nutch, do you? > > > >> > > > >> Thanks > > > > > > > > -- > > > > *Lewis* > > > > -- > > Markus Jelsma - CTO - Openindex -- Markus Jelsma - CTO - Openindex
Re: tika-core, tika-parser
sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just like any other dependency managed by Ivy. They end up being copied in runtime/local/plugins/parse-tika/ or put in the job in runtime/deploy/ On 8 February 2012 13:03, Markus Jelsma wrote: > Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's > something else. > > dependencies, dependencies, dependencies :( > > On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: > > The dependencies for the plugins are defined locally as shown in the URL > > below, where you can see the ref to tika-parsers for parse-tika. Is that > > more clear for you Markus? > > > > On 8 February 2012 12:58, Lewis John Mcgibbney > wrote: > > > Hi Markus, > > > > > > For starters > > > > > > > > > > http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi > > > ew=markup > > > > > > Can we pick our way through this? > > > > > > Thanks > > > > > > > > > On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > > > > > > > > > wrote: > > >> Hi, > > >> > > >> Can anyone shed light on this? We don't have any parsers in our libs > dir > > >> and > > >> we don't have tika-parsers jar, only the tika-core jar. Where are the > > >> parsers > > >> and how does this all work? > > >> > > >> I've posted a question (same subject) on the Tika list and Nick tells > me > > >> there > > >> must be parsers somewhere. Well, i have no idea how we do it in Nutch, > > >> do you? > > >> > > >> Thanks > > > > > > -- > > > *Lewis* > > -- > Markus Jelsma - CTO - Openindex > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
Re: tika-core, tika-parser
Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's something else. dependencies, dependencies, dependencies :( On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: > The dependencies for the plugins are defined locally as shown in the URL > below, where you can see the ref to tika-parsers for parse-tika. Is that > more clear for you Markus? > > On 8 February 2012 12:58, Lewis John Mcgibbney wrote: > > Hi Markus, > > > > For starters > > > > > > http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi > > ew=markup > > > > Can we pick our way through this? > > > > Thanks > > > > > > On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > > > > > > wrote: > >> Hi, > >> > >> Can anyone shed light on this? We don't have any parsers in our libs dir > >> and > >> we don't have tika-parsers jar, only the tika-core jar. Where are the > >> parsers > >> and how does this all work? > >> > >> I've posted a question (same subject) on the Tika list and Nick tells me > >> there > >> must be parsers somewhere. Well, i have no idea how we do it in Nutch, > >> do you? > >> > >> Thanks > > > > -- > > *Lewis* -- Markus Jelsma - CTO - Openindex
Re: tika-core, tika-parser
Yes, it's listed there indeed! But where are the parser impls then? I'll check this out. I must be getting crazy or something! On Wednesday 08 February 2012 13:58:46 Lewis John Mcgibbney wrote: > Hi Markus, > > For starters > > http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?view > =markup > > Can we pick our way through this? > > Thanks > > On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > > wrote: > > Hi, > > > > Can anyone shed light on this? We don't have any parsers in our libs dir > > and > > we don't have tika-parsers jar, only the tika-core jar. Where are the > > parsers > > and how does this all work? > > > > I've posted a question (same subject) on the Tika list and Nick tells me > > there > > must be parsers somewhere. Well, i have no idea how we do it in Nutch, do > > you? > > > > Thanks -- Markus Jelsma - CTO - Openindex
Re: tika-core, tika-parser
The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney wrote: > Hi Markus, > > For starters > > > http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?view=markup > > Can we pick our way through this? > > Thanks > > > On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > wrote: > >> Hi, >> >> Can anyone shed light on this? We don't have any parsers in our libs dir >> and >> we don't have tika-parsers jar, only the tika-core jar. Where are the >> parsers >> and how does this all work? >> >> I've posted a question (same subject) on the Tika list and Nick tells me >> there >> must be parsers somewhere. Well, i have no idea how we do it in Nutch, do >> you? >> >> Thanks >> > > > > -- > *Lewis* > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
Re: tika-core, tika-parser
Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?view=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma wrote: > Hi, > > Can anyone shed light on this? We don't have any parsers in our libs dir > and > we don't have tika-parsers jar, only the tika-core jar. Where are the > parsers > and how does this all work? > > I've posted a question (same subject) on the Tika list and Nick tells me > there > must be parsers somewhere. Well, i have no idea how we do it in Nutch, do > you? > > Thanks > -- *Lewis*