[
https://issues.apache.org/jira/browse/HADOOP-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523895
]
Milind Bhandarkar commented on HADOOP-1815:
-------------------------------------------
Actually, we faced this problem while using distcp also. (Sorry for not
mentioning it in the original mail.) disctcp (i.e. o.a.h.util.CopyFiles) is a
mapred and dfs client appplication. However, since it is part of THE hadoop
jar, if we make modifications to it, and use the modified version as a client
jar, it is not reflected on the server-side (i.e. the tasktrackers pick up the
original version of CopyFiles, rather than from the client one.) This is also
true for streaming, now that the contrib jars are on the class path and are
searched before the user-supplied jar. So, in that case, there is a situation
that the JobClient side changes in user-supplied jar work, but the Task-side
changes are picked up from the deployed jars on tasktrackers.
I do not claim to have a solution for this. But I am sure someone out there has
come across this before and solved this problem.
> Separate client and server jars
> -------------------------------
>
> Key: HADOOP-1815
> URL: https://issues.apache.org/jira/browse/HADOOP-1815
> Project: Hadoop
> Issue Type: Bug
> Components: build
> Affects Versions: 0.14.0
> Environment: All
> Reporter: Milind Bhandarkar
> Fix For: 0.15.0
>
>
> For the ease of deployment, one should not have to change the server jars,
> and restart clusters, when minor features on the client side are changed.
> This requireds separating client and server jars for hadoop. Version numbers
> appended to hadoop jars can reflect the compatibility. e.g. the server jar
> could be at 0.13.1, and the client jar could be at 0.13.2. In short, we can
> treat the part following 0. as the "major" version number for now.
> This allows major client frameworks such as streaming and Pig happy. To my
> knowledge, Pig uses hadoop's default jobclient. Whereas streaming uses its
> own jobclient. I would love to change streaming to use the default hadoop
> jobclient, if I can make modifications to it (e.g. to print more stats that
> are available from TaskReport, for example), if I do not have to deploy the
> new version of the whole jar to the backend and restart the mapreduce cluster.
> (I thought there was already a bug filed for separating the client and server
> jar, but I could not find it. Hence the new Jira. Sorry about duplication, if
> any.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.