As a side note -- Hadoop uses the simplest trick possible to figure out the JAR
location of the originating class -- it attempts to load a resource named after
the class' bytecode...
private static String findContainingJar(Class my_class) {
ClassLoader loader = my_class.getClassLoader();
String class_file = my_class.getName().replaceAll("\\.", "/") + ".class";
try {
for(Enumeration itr = loader.getResources(class_file);
itr.hasMoreElements();) {
URL url = (URL) itr.nextElement();
if ("jar".equals(url.getProtocol())) {
String toReturn = url.getPath();
if (toReturn.startsWith("file:")) {
toReturn = toReturn.substring("file:".length());
}
toReturn = URLDecoder.decode(toReturn, "UTF-8");
return toReturn.replaceAll("!.*$", "");
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return null;
}
Note the "replaceAll" line -- it truncates inside-JAR path from jar location. I
also looked at the submitter and isolation runner, they seem to work according
to my intuition I presented earlier (thread context class loader has pointers to
the invoked JAR plus all jars under lib/), there should be no need to specify
jars explicitly. I even tend to think this is a headache for the future...
Dawid
Dawid Weiss wrote:
I changed the main's to pass in the location of the jar, since the ANT
task puts the jar in basedir/dist. I made a comment about it on
Mahout-3. The Canopy driver should do the right thing????? I also
did the same thing w/ the k-means.
I honestly don't think the JAR file must be specified as part of the
JobConf. This is a hint, but it's a hint used only in very special cases
(which I can't think of, to be honest). To my understanding, the
situation is like this:
- When you assemble a job JAR, you should package it with all required
dependencies under {jarfile.jar}/lib folder.
- All these classes are visible through context class loader set by
Hadoop, so no special JAR tricks are required. When you submit a Hadoop
job (remotely), you point to the JAR file with all dependencies and
Hadoop can take it from there.
- When you run in-memory task tracker (for debugging or locally), all
the classes should be available through normal classpath and context
class loader (again) should resolve them successfully.
Can you enlighten me when pointing an explicit JAR file for JobConf is
required?
Dawid