As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name). Alternately these clients could use a socks proxy.

The amount of traffic to NN is not much and tunneling should not affect performance.

Raghu.

Brian Bockelman wrote:
Hey all,

Had a problem I wanted to ask advice on. The Caltech site I work with currently have a few GridFTP servers which are on the same physical machines as the Hadoop datanodes, and a few that aren't. The GridFTP server has a libhdfs backend which writes incoming network data into HDFS.

They've found that the GridFTP servers which are co-located with HDFS datanode have poor performance because data is incoming at a much faster rate than the HDD can handle. The standalone GridFTP servers, however, push data out to multiple nodes at one, and can handle the incoming data just fine (>200MB/s).

Is there any way to turn off the preference for the local node? Can anyone think of a good workaround to trick HDFS into thinking the client isn't on the same node?

Brian

Reply via email to