Assuming everything is up this solution still will not scale given the latency, tcpip buffers, sliding window etc. See BDP
Sent from my iPad On Aug 1, 2011, at 4:57 PM, Michael Segel <michael_se...@hotmail.com> wrote: > > Yeah what he said. > Its never a good idea. > Forget about losing a NN or a Rack, but just losing connectivity between data > centers. (It happens more than you think.) > Your entire cluster in both data centers go down. Boom! > > Its a bad design. > > You're better off doing two different clusters. > > Is anyone really trying to sell this as a design? That's even more scary. > > >> Subject: Re: Hadoop cluster network requirement >> From: a...@apache.org >> Date: Sun, 31 Jul 2011 20:28:53 -0700 >> To: common-user@hadoop.apache.org; saq...@margallacomm.com >> >> >> On Jul 31, 2011, at 7:30 PM, Saqib Jang -- Margalla Communications wrote: >> >>> Thanks, I'm independently doing some digging into Hadoop networking >>> requirements and >>> had a couple of quick follow-ups. Could I have some specific info on why >>> different data centers >>> cannot be supported for master node and data node comms? >>> Also, what >>> may be the benefits/use cases for such a scenario? >> >> Most people who try to put the NN and DNs in different data centers are >> trying to achieve disaster recovery: one file system in multiple locations. >> That isn't the way HDFS is designed and it will end in tears. There are >> multiple problems: >> >> 1) no guarantee that one block replica will be each data center (thereby >> defeating the whole purpose!) >> 2) assuming one can work out problem 1, during a network break, the NN will >> lose contact from one half of the DNs, causing a massive network >> replication storm >> 3) if one using MR on top of this HDFS, the shuffle will likely kill the >> network in between (making MR performance pretty dreadful) is going to cause >> delays for the DN heartbeats >> 4) I don't even want to think about rebalancing. >> >> ... and I'm sure a lot of other problems I'm forgetting at the moment. >> So don't do it. >> >> If you want disaster recovery, set up two completely separate HDFSes and >> run everything in parallel. >