[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093705#comment-13093705 ]
Ferdy commented on NUTCH-937: ----------------------------- I finally found out what the problem is with the above suggestion. It was a terrible problem to debug because of the random elements involved. Setting the "plugins.folders" to "${job.local.dir}/../jars/plugins" works only in certain cases. If you have a single folder specified in "mapred.local.dir" there will be no trouble at all. However, when you have multiple folders specifed (which is a legit thing to do in Hadoop in order to spread tasks working folders over multiple disks), sometimes loading the plugins results in a NPE because the plugins folder does not exist. This is caused by the fact that the jars directory (as unpacked by the TaskTracker) IS NOT ALWAYS ON THE SAME DISK AS THE WORKING FOLDER. This means for example if you have 2 folders in "mapred.local.dir" (let's say "/mnt/disk1/mapred,/mnt/disk2/mapred") the jars may be unpacked in "/mnt/disk1/mapred/taskTracker/ferdy/jobcache/job_201108301201_0001/work/../jars/plugins" but the working directory (wich the "job.local.dir" property is set to) could be "/mnt/disk2/mapred/taskTracker/ferdy/jobcache/job_201108301201_0001/work/". Now I'm not sure whether this is a good thing, perhaps it is because most of the time you will want to unpack a jar once for a job and still run task attempts on multiple disk for a TaskTracker. It is however very troublesome in cases such as this issue and therefore I strongly recommend against setting the "plugins.folders" to "${job.local.dir}/../jars/plugins", unless you only have one folder specified in "mapred.local.dir" of course. The workaround I am currently using is to put the plugins folder not in the root of the jar, but in classes/plugins so that Hadoop unjars it and sets it on the classpath automatically. This way there is no need to change the "mapreduce.job.jar.unpack.pattern" property and "plugins.folders" can be left to it's default of "plugins". This suggestion requires a slight modification of Nutch's build.xml file. > When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins > because MapReduce will not unpack plugin/ directory from the job's pack (due > to MAPREDUCE-967) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: NUTCH-937 > URL: https://issues.apache.org/jira/browse/NUTCH-937 > Project: Nutch > Issue Type: Bug > Components: build > Affects Versions: 1.2 > Environment: hadoop 0.21 or cloudera hadoop 0.20.2+737 > Reporter: Claudio Martella > Assignee: Markus Jelsma > Fix For: 1.4, 2.0 > > > Jobs running in on hadoop 0.21 or cloudera cdh 0.20.2+737 will fail because > of missing plugins (i.e.): > 10/10/28 12:22:21 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 10/10/28 12:22:22 INFO mapred.FileInputFormat: Total input paths to > process : 1 > 10/10/28 12:22:23 INFO mapred.JobClient: Running job: job_201010271826_0002 > 10/10/28 12:22:24 INFO mapred.JobClient: map 0% reduce 0% > 10/10/28 12:22:39 INFO mapred.JobClient: Task Id : > attempt_201010271826_0002_m_000000_0, Status : FAILED > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:379) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317) > at org.apache.hadoop.mapred.Child$4.run(Child.java:217) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) > at org.apache.hadoop.mapred.Child.main(Child.java:211) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 9 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 17 more > Caused by: java.lang.RuntimeException: x point > org.apache.nutch.net.URLNormalizer not found. > at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122) > at > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) > ... 22 more > 10/10/28 12:22:40 INFO mapred.JobClient: Task Id : > attempt_201010271826_0002_m_000001_0, Status : FAILED > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:379) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317) > at org.apache.hadoop.mapred.Child$4.run(Child.java:217) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) > at org.apache.hadoop.mapred.Child.main(Child.java:211) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 9 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 17 more > Caused by: java.lang.RuntimeException: x point > org.apache.nutch.net.URLNormalizer not found. > at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122) > at > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) > ... 22 more > The bug is due to MAPREDUCE-967 (part of hadoop 0.21 and cdh 0.20.2+737) > which modifies the way MapReduce unpacks the job's jar. The old way was to > unpack the whole of it, now only classes/ and lib/ are unpacked. This way > nutch is missing the plugins/ directory. > A workaround is to force unpacking of the plugin/ directory by setting > 'mapreduce.job.jar.unpack.pattern' configuration to > "(?:classes/|lib/|plugins/).*" -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira