javaxtreme wrote:
Hello all,
I am having a bit of a problem with a seemingly simple problem. I would like
to have some global variable which is a byte array that all of my map tasks
have access to. The best way that I currently know of to do this is to have
a file sitting on the DFS and load that into each map task (note: the global
variable is very small ~20kB). My problem is that I can't seem to load any
file from the Hadoop DFS into my program via the API. I know that the
DistributedFileSystem class has to come into play, but for the life of me I
can't get it to work.
I noticed there is an initialize() method within the DistributedFileSystem
class, and I thought that I would need to call that, however I'm unsure what
the URI parameter ought to be. I tried "localhost:50070" which stalled the
system and threw a connectionTimeout error. I went on to just attempt to
call DistributedFileSystem.open() but again my program failed this time with
a NullPointerException. I'm assuming that is stemming from he fact that my
DFS object is not "initialized".
Does anyone have any information on how exactly one programatically goes
about loading in a file from the DFS? I would greatly appreciate any help.
If the data changes, this sounds more like the kind of data that a
distributed hash table or tuple space should be looking after...sharing
facts between nodes
1. what is the rate of change of the data?
2. what are your requirements for consistency?
If the data is static, then yes, a shared file works. Here's my code
fragments to work with one. You grab the URI from the configuration,
then initialise the DFS with both the URI and the configuration.
public static DistributedFileSystem
createFileSystem(ManagedConfiguration conf) throws
SmartFrogRuntimeException {
String filesystemURL =
conf.get(HadoopConfiguration.FS_DEFAULT_NAME);
URI uri = null;
try {
uri = new URI(filesystemURL);
} catch (URISyntaxException e) {
throw (SmartFrogRuntimeException) SmartFrogRuntimeException
.forward(ERROR_INVALID_FILESYSTEM_URI + filesystemURL,
e);
}
DistributedFileSystem dfs = new DistributedFileSystem();
try {
dfs.initialize(uri, conf);
} catch (IOException e) {
throw (SmartFrogRuntimeException) SmartFrogRuntimeException
.forward(ERROR_FAILED_TO_INITIALISE_FILESYSTEM, e);
}
return dfs;
}
As to what URLs work, try "localhost:9000"; this works on machines
where I've brought a DFS up on that port. Use netstat to verify your
chosen port is live.