Hi all,
I’d like to create a new URI for a distributed POSIX-compliant filesystem shared between all nodes. A number of such filesystems currently exist (think HDFS w/o the POSIX incompliance). We can, of course, run HDFS on top of such a file system, but it adds an extra unnecessary and inefficient layer. Why have a master retrieve a set of data from a FS cluster, only to distribute it back out to the same cluster but on a different distributed FS (HDFS)? In the new URI I seek to create, each MapReduce slave would look for input data from a seemingly local file:///, and write output to it as well. Assume that the distributed FS handles concurrent reads, writes. Assuming POSIX-compliance, the LocalFileSystem seems to be the best foundation. Please let me know of any warnings or errors you see in this. Any advice is strongly appreciated as well, as the source tree of Hadoop is new to me and intimidating. Best, --Chris