[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Talat UYARER updated NUTCH-1253: -------------------------------- Attachment: NUTCH-1253-2.x-eclipse.patch [~icebergx5] is right. At the present 2.x branch does not work with eclipse. Eclipse says " Missing required library" about neko 0.9.5. I think [~lewismc] forget adding nekohtml dependecy for eclipse target in build.xml. I create a bugfix patch for 2.x. > Incompatible neko and xerces versions > ------------------------------------- > > Key: NUTCH-1253 > URL: https://issues.apache.org/jira/browse/NUTCH-1253 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.4 > Environment: Ubuntu 10.04 > Reporter: Dennis Spathis > Assignee: Lewis John McGibbney > Fix For: 2.3, 1.8 > > Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, > NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, > NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, > TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, > TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, > nutch1253parsed.html, nutch1253test.html > > > The Nutch 1.4 distribution includes > - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- > nekohtml) > - xercesImpl-2.9.1.jar (under .../runtime/local/lib) > These two JARs appear to be incompatible versions. When the HtmlParser > (configured to use neko) is invoked during a local-mode crawl, the parse > fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, > rebuild the HtmlParser plugin and add a > catch(Throwable) clause in the getParse method to log the stacktrace.) > I found that substituting a later, compatible version of nekohtml (1.9.11) > fixes the problem. > Curiously, and in support of the above, the nekohtml plugin.xml file in > Nutch 1.4 contains the following: > <plugin > id="lib-nekohtml" > name="CyberNeko HTML Parser" > version="1.9.11" > provider-name="org.cyberneko"> > <runtime> > <library name="nekohtml-0.9.5.jar"> > <export name="*"/> > </library> > </runtime> > </plugin> > Note the conflicting version numbers (version tag is "1.9.11" but the > specified library is "nekohtml-0.9.5.jar"). > Was the 0.9.5 version included by mistake? Was the intention rather to > include 1.9.11? -- This message was sent by Atlassian JIRA (v6.2#6252)