Even if writes are happening in parallel from a single machine, wouldn't the network congestion cause slow down due to packet collision?
- Prasad. On Thursday 18 September 2008 10:47:48 pm Raghu Angadi wrote: > Steve Loughran wrote: > > [EMAIL PROTECTED] wrote: > >> thanks for the replies. So looks like replication might be the real > >> overhead when compared to scp. > > > > Makes sense, but there's no reason why you couldn't have first node you > > copy up the data to, continue and pass that data to the other nodes. > > Replication can not account for 50% slow down. When the data is written, > the writes on replicas are pipelined. So essentially data is written to > replicas in parallel. > > Raghu. > > > If > > its in the same rack, you save on backbone bandwidth, and if it is in a > > different rack, well, the client operation still finishes faster. A > > feature for someone to implement, perhaps? > > > >>> Also dfs put copies multiple replicas unlike scp. > >>> > >>> Lohit > >>> > >>> On Sep 17, 2008, at 6:03 AM, "��明" <[EMAIL PROTECTED]> > >>> wrote: > >>> > >>> Actually, No. > >>> As you said, I understand that "dfs -put" breaks the data into > >>> blocksand then copies to datanodes, > >>> but scp do not breaks the data into blocksand , and just copy the > >>> data to > >>> the namenode! > >>> > >>> > >>> 2008/9/17, Prasad Pingali <[EMAIL PROTECTED]>: > >>> > >>> Hello, > >>> I observe that scp of data to the namenode is faster than actually > >>> putting > >>> into dfs (all nodes coming from same switch and have same ethernet > >>> cards, > >>> homogenous nodes)? I understand that "dfs -put" breaks the data into > >>> blocks > >>> and then copies to datanodes, but shouldn't that be atleast as fast as > >>> copying data to namenode from a single machine, if not faster? > >>> > >>> thanks and regards, > >>> Prasad Pingali, > >>> IIIT Hyderabad. > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> Sorry for my english!! 明 > >>> Please help me to correct my english expression and error in syntax