As hack, you could tunnel NN traffic from GridFTP clients through a
different machine (by changing fs.default.name). Alternately these
clients could use a socks proxy.
The amount of traffic to NN is not much and tunneling should not affect
performance.
Raghu.
Brian Bockelman wrote:
Hey all,
Had a problem I wanted to ask advice on. The Caltech site I work with
currently have a few GridFTP servers which are on the same physical
machines as the Hadoop datanodes, and a few that aren't. The GridFTP
server has a libhdfs backend which writes incoming network data into HDFS.
They've found that the GridFTP servers which are co-located with HDFS
datanode have poor performance because data is incoming at a much faster
rate than the HDD can handle. The standalone GridFTP servers, however,
push data out to multiple nodes at one, and can handle the incoming data
just fine (>200MB/s).
Is there any way to turn off the preference for the local node? Can
anyone think of a good workaround to trick HDFS into thinking the client
isn't on the same node?
Brian