[jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Dennis Kubes (JIRA) Sat, 27 Oct 2007 12:23:14 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538245
 ]


Dennis Kubes commented on HADOOP-1622:
--------------------------------------

1. Could you please remove the mention of 'final' and 'default' config 
resources from the javadoc for JobConf.{get|set}JobResources? They are no 
longer relevant vis-a-vis hadoop Configuration.

I have removed the mention of final and default resources.

2. Should we also have a JobConf.setJobResource along with 
JobConf.addJobResource, ala {{DistributedCache} apis?

I had debated about set vs add resources.  The current behavior is when you add 
a resource you are appending it to a list of resources as opposed to setting a 
resource which would clear anything previously added and add only that 
resource.  Since many times jar resources are added by including the jar file 
which contains a given class, I thought it better to NOT allow clearing and 
resetting of job resources.

3. Should we move the private JobClient.createJobJar method to JarUtils to make 
it available as a useful utility?

I debated about this too.  JarUtils was generic jaring and unjaring utilities.  
But I don't see harm in putting createJobJar in and I think you are right we 
may need that somewhere else in the future.  I have remvoed from JobClient and 
added to JarUtils.

Unrelated: Does it make sense to rename Configuration.addResource to 
Configuration.addConfigResource? I wonder how confusing these unrelated api 
names are, given JobConf is a Configuration to

Yeah, debated about this one too.  In the end we weren't just adding jars but 
multiple things such as classes, exe, files.  Couldn't find a better name  for 
that then resource.  I put it as jobResource to be a little less confusing.  
Changing Configuration over to configResource would be good I think, Although 
we should probably deprecate because a lot of things rely on that method.

I am currently testing patch 9, will have it posted shortly.

> Hadoop should provide a way to allow the user to specify jar file(s) the user 
> job depends on
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1622
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>            Assignee: Dennis Kubes
>             Fix For: 0.16.0
>
>         Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, 
> HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, 
> multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch
>
>
> More likely than not, a user's job may depend on multiple jars.
> Right now, when submitting a job through bin/hadoop, there is no way for the 
> user to specify that. 
> A walk around for that is to re-package all the dependent jars into a new jar 
> or put the dependent jar files in the lib dir of the new jar.
> This walk around causes unnecessary inconvenience to the user. Furthermore, 
> if the user does not own the main function 
> (like the case when the user uses Aggregate, or datajoin, streaming), the 
> user has to re-package those system jar files too.
> It is much desired that hadoop provides a clean and simple way for the user 
> to specify a list of dependent jar files at the time 
> of job submission. Someting like:
> bin/hadoop .... --depending_jars j1.jar:j2.jar 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on

Reply via email to