> We have some very large files that we access via memory mapping in > Java. Someone's asked us about how to make this conveniently > deployable in Hadoop. If we tell them to put the files into hdfs, can > we obtain a File for the underlying file on any given node?
We sometimes find it convenient to have a small nfs share across the datanodes for this type of thing. Other times we find it convenient to just package up the data and submit it with the job so it can be addressed as a resource on the classpath. Depends on how large "very large" is as to which of those I would find most convenient.
