Hi all,
I tried to fix the hadoop streaming bug for the version 0.21.0 (streaming
overrides user given output key and value types). I saw some useful message
about this issue on
https://issues.apache.org/jira/browse/MAPREDUCE-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
and modified some code following the patch file. I modified and compiled the
code. It seems only about thirteen .java files need to be modified. But when I
tried to replace the old .classes files using the new ones, I can only find
StreamJob.class in ${hadoop_home}/
/root/hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar. And
the other twelve modified files could't be found in any jar files in the
${hadoop_home} directory.
Then I executed the command "bin/hadoop jar
mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -mapper
org.apache.hadoop.mapred.lib.IdentityMapper -reducer NONE -input input -output
output" with the modified streaming jar and just received some error
information:
Exception in thread "main" java.lang.ClassNotFoundException: -mapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:185)
And I think this error should have some thing to do with the modification of
the StreamJob.java. But I saw someone says they have fixed the streaming
override issue using the patch.
So, Could anyone give me some suggestion about this issue? Or just give me
another way to fix the bug?
Thanks in advance! : )
Thanks & best regards,
Wenjing