[ 
https://issues.apache.org/jira/browse/HADOOP-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-12406:
---------------------------------------------
    Assignee: Nadeem Douba
      Status: Open  (was: Patch Available)

Hi [~ndouba],

I'm about to do a 2.7.3 Apache Hadoop release and finally got around to this 
again.

h4. Analysis
To make progress, I had to read up a bit on nutch and about how to run this so 
that I can reproduce the bug in order to rationalize your patch. I finally 
succeeded in doing so! Tested this with 2.7.2 release and nutch 1.11 and using 
the URL feed [given at 
NUTCH-1084|https://issues.apache.org/jira/browse/NUTCH-1084?focusedCommentId=13882771&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13882771]
{code}
~/tmp/common/hadoop-common-2.7.2/bin/hadoop jar apache-nutch-1.11.job 
org.apache.nutch.crawl.CrawlDbReader 
file:///tmp/nutch/apache-nutch-1.11/runtime/local/crawl/crawldb/ -url 
http://bappenas.go.id/
{code}

I can reproduce all the problems listed at NUTCH-1084 - with readdb, MR 
local-job-runner based job for crawling etc.

The real issue is that Nutch's readdb is client-only and *not* running a 
MapReduce job which was my question before. For regular MR jobs, the job-jar 
*is* on the system class-loader. For the client-only invocations using "hadoop 
jar" and local-job-runner, the job-jar is actually *not* on the 
system-classpath - that is why you are running into the issue.

h4. Summary
Your patch looks good to me. Clearly, the thread context-loader falls back to 
system class-loader where it is not overridden - so we are fine for all the 
ways of loading the classes in readFields.

I'll resubmit your patch with minor commenting related changes to Jenkins and 
commit if Mr.Jenkins is also fine.

> AbstractMapWritable.readFields throws ClassNotFoundException with custom 
> writables
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-12406
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12406
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.7.1
>         Environment: Ubuntu Linux 14.04 LTS amd64
>            Reporter: Nadeem Douba
>            Assignee: Nadeem Douba
>            Priority: Blocker
>              Labels: bug, hadoop, io, newbie, patch-available
>         Attachments: HADOOP-12406.patch
>
>
> Note: I am not an expert at JAVA, Class loaders, or Hadoop. I am just a 
> hacker. My solution might be entirely wrong.
> AbstractMapWritable.readFields throws a ClassNotFoundException when reading 
> custom writables. Debugging the job using remote debugging in IntelliJ 
> revealed that the class loader being used in Class.forName() is different 
> than that used by the Thread's current context 
> (Thread.currentThread().getContextClassLoader()). The class path for the 
> system class loader does not include the libraries of the job jar. However, 
> the class path for the context class loader does. The proposed patch changes 
> the class loading mechanism in readFields to use the Thread's context class 
> loader instead of the system's default class loader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to