[ https://issues.apache.org/jira/browse/NUTCH-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1893: ---------------------------------------- Fix Version/s: (was: 2.4) 2.3.1 > Parse-tika fails to parse feed files > ------------------------------------ > > Key: NUTCH-1893 > URL: https://issues.apache.org/jira/browse/NUTCH-1893 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 2.3, 1.9 > Environment: Windows 7 + Cygwin + JDK 7 > Reporter: Mengying Wang > Assignee: Sebastian Nagel > Priority: Minor > Fix For: 1.10, 2.3.1 > > Attachments: NUTCH-1893-v1.patch, NUTCH-1893.mywang.141209.txt > > > In the Nutch parse step, I received the following error. It seems the > parse-tika plugin has broken. > $ /cygdrive/d/nutch_trunk/runtime/local/bin/nutch parse -D > mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D > mapred.reduce.tasks.speculative.execution=false -D > mapred.map.tasks.speculative.execution=false -D > mapred.compress.map.output=true -D mapred.skip.attempts.to.start.skipping=2 > -D mapred.skip.map.max.skip.records=1 crawlId/segments/20141118235323 > java.lang.ExceptionInInitializerError > at com.sun.syndication.io.SyndFeedInput.build(SyndFeedInput.java:136) > at org.apache.tika.parser.feed.FeedParser.parse(FeedParser.java:70) > at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:103) > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:95) > at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:101) > at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: java.lang.NullPointerException > at java.util.Properties$LineReader.readLine(Properties.java:434) > at java.util.Properties.load0(Properties.java:353) > at java.util.Properties.load(Properties.java:341) > at > com.sun.syndication.io.impl.PropertiesLoader.<init>(PropertiesLoader.java:74) > at > com.sun.syndication.io.impl.PropertiesLoader.getPropertiesLoader(PropertiesLoader.java:46) > at > com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:54) > at > com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:46) > at > com.sun.syndication.feed.synd.impl.Converters.<init>(Converters.java:40) > at > com.sun.syndication.feed.synd.SyndFeedImpl.<clinit>(SyndFeedImpl.java:59) > ... 10 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)