Hmmm. I could have sworn there was a background balancing bandwidth limiter.
Haven't tested random reads... The last test we did ended up hitting the cache, but we didn't push it hard enough to hit network bandwidth limitations... Not to say they don't exist. Like I said in the other post, if we had more disks... we would hit it. We'll have to do more random testing. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 27, 2011, at 9:34 PM, "Ryan Rawson" <[email protected]> wrote: > There are no bandwidth limitations in 0.20.x. None that I saw at > least. It was basically bandwidth-management-by-pwm. You could > adjust the frequency of how many files-per-node were copied. > > In my case, the load was HBase real time serving, so it was servicing > more smaller random reads, not a map-reduce. Everyone has their own > use case :-) > > -ryan > > On Mon, Jun 27, 2011 at 6:54 PM, Segel, Mike <[email protected]> wrote: >> That doesn't seem right. >> In one of our test clusters (19 data nodes) we found that under heavy loads >> we were disk I/O bound and not network bound. Of course YMMV depending on >> your ToR switch. If we had more than 4 disks per node, we would probably see >> the network being the bottleneck. What did you set your bandwidth settings >> in the hdfs-site.xml? ( going from memory not sure of the exact setting...) >> >> But the good news... Newer hardware will start to have 10GBe on the >> motherboard. >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Jun 27, 2011, at 7:11 PM, "Ryan Rawson" <[email protected]> wrote: >> >>> On the subject of gige vs 10-gige, I think that we will very shortly >>> be seeing interest in 10gig, since gige is only 120MB/sec - 1 hard >>> drive of streaming data. Nodes with 4+ disks are throttled by the >>> network. On a small cluster (20 nodes), the replication traffic can >>> choke a cluster to death. The only way to fix quickly it is to bring >>> that node back up. Perhaps the HortonWorks guys can work on that. >>> >>> -ryan >>> >>> On Mon, Jun 27, 2011 at 4:38 AM, Steve Loughran <[email protected]> wrote: >>>> On 26/06/11 20:23, Scott Carey wrote: >>>>> >>>>> >>>>> On 6/23/11 5:49 AM, "Steve Loughran"<[email protected]> wrote: >>>>> >>>> >>>>>> what's your HW setup? #cores/server, #servers, underlying OS? >>>>> >>>>> CentOS 5.6. >>>>> 4 cores / 8 threads a server (Nehalem generation Intel processor). >>>> >>>> >>>> that should be enough to find problems. I've just moved up to a 6-core 12 >>>> thread desktop and that found problems on some non-Hadoop code, which shows >>>> that the more threads you have, and the faster the machines are, the more >>>> your race conditions show up. With Hadoop the fact that you can have >>>> 10-1000 >>>> servers means that in a large cluster the probability of that race >>>> condition >>>> showing up scales well. >>>> >>>>> Also run a smaller cluster with 2x quad core Core 2 generation Xeons. >>>>> >>>>> Off topic: >>>>> The single proc Nehalem is faster than the dual core 2's for most use >>>>> cases -- and much lower power. Looking forward to single proc 4 or 6 core >>>>> Sandy Bridge based systems for the next expansion -- testing 4 core vs 4 >>>>> core has these 30% faster than the Nehalem generation systems in CPU bound >>>>> tasks and lower power. Intel prices single socket Xeons so much lower >>>>> than the Dual socket ones that the best value for us is to get more single >>>>> socket servers rather than fewer dual socket ones (with similar processor >>>>> to hard drive ratio). >>>> >>>> Yes, in a large cluster the price of filling the second socket can compare >>>> to a lot of storage, and TB of storage is more tangible. I guess it depends >>>> on your application. >>>> >>>> Regarding Sandy Bridge, I've no experience of those, but I worry that 10 >>>> Gbps is still bleeding edge, and shouldn't be needed for code with good >>>> locality anyway; it is probably more cost effective to stay at >>>> 1Gbps/server, >>>> though the issue there is the #of HDD/s server generates lots of >>>> replication >>>> traffic when a single server fails... >>>> >> >> >> The information contained in this communication may be CONFIDENTIAL and is >> intended only for the use of the recipient(s) named above. If you are not >> the intended recipient, you are hereby notified that any dissemination, >> distribution, or copying of this communication, or any of its contents, is >> strictly prohibited. If you have received this communication in error, >> please notify the sender and delete/destroy the original message and any >> copy of it from your computer or paper files. >> The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
