[ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694210#comment-14694210 ]
Michael Joyce commented on NUTCH-2049: -------------------------------------- Hey [~lewismc], Tried your patch here. Seems I have to add the following to the ivy.xml file to get this to work at all {code} <dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-jobclient" rev="2.4.0" conf="*->default"/> {code} Otherwise, I end up getting the following when I try to run a test crawl {code} Injector: starting at 2015-08-12 15:04:42 Injector: crawlDb: crawl/crawldb Injector: urlDir: ../../urls_test Injector: Converting injected urls to crawl db entries. Injector: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:449) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:832) at org.apache.nutch.crawl.Injector.inject(Injector.java:323) at org.apache.nutch.crawl.Injector.run(Injector.java:379) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.crawl.Injector.main(Injector.java:369) {code} However, after addressing that concern I end up runnign into the following on the test crawl {code} java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.SequenceFile$Writer$KeyClassOption cannot be cast to org.apache.hadoop.io.MapFile$Writer$Option at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.SequenceFile$Writer$KeyClassOption cannot be cast to org.apache.hadoop.io.MapFile$Writer$Option at org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:70) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:484) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-08-12 14:24:39,906 ERROR fetcher.Fetcher - Fetcher: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:496) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:532) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:505) {code} > Upgrade Trunk to Hadoop > 2.4 stable > ------------------------------------ > > Key: NUTCH-2049 > URL: https://issues.apache.org/jira/browse/NUTCH-2049 > Project: Nutch > Issue Type: Improvement > Components: build > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Fix For: 1.11 > > Attachments: NUTCH-2049.patch > > > Convo here - http://www.mail-archive.com/dev%40nutch.apache.org/msg18225.html > I am +1 for taking trunk (or a branch of trunk) to explicit dependency on > > Hadoop 2.6. > We can run our tests, we can validate, we can fix. > I will be doing validation on 2.X in paralegal as this is what I use on my > own projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)