Using that formula I find that for any reasonable combination of number of regions and regionservers and estimated number of store files (assuming 10g regions) I always find the recommended number of xceivers in the few 1000's. So 4096 is not an unreasonable general recommendation for a production instance. ________________________________ From: Lars George <[email protected]> To: [email protected]; [email protected] Sent: Wednesday, March 21, 2012 11:59 PM Subject: Re: Max xceiver config
Hi Laxman, Did you see (sorry for the plug) http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html - it might help determining the number. Lars On Mar 22, 2012, at 6:43 AM, Laxman wrote: > HBase book is recommending to set xceiver count[dfs.datanode.max.xcievers] > to 4096 > http://hbase.apache.org/book.html#hadoop > > Why do we need to have xceivers count so high as 4096? > > This means each Datanode in cluster is allowing the maximum of > - 4096 threads with each thread occupying some memory > - 4096 threads read/write to the disk(s) simultaneously > > This actually makes the system more vulnerable (kind of DOS attacks) by > over-utilization of the system resources. > > Also, this recommendation was based on some issue reported on Hadoop 0.18. > IMO, we should not have such high value as recommendation/default value and > this value to be tuned as per the capacity requirements. > > Related issues > ============== > HDFS-162 > - Reported on 0.18 > - Raising xciever count to high value caused other problems. > - Resolution "Cannot Reproduce " > > HDFS-1861 > - Modified the default value to 4096 > - Source > http://ccgtech.blogspot.in/2010/02/hadoop-hdfs-deceived-by-xciever.html > which again refers to HDFS-162 (Reported on 0.18). > > Case study > ========== > http://lucene.472066.n3.nabble.com/Blocks-are-getting-corrupted-under-very-h > igh-load-tc3527403.html > In one of our production environment, this value has been set to 4096 and > disk waits were very huge due to which some processes were not responding. > Also OS is configured to reboot (kernel panic reboot) when some process is > not responding for a specific amount of time. > > These two configurations has resulted in corrupted data. > -- > Regards, > Laxman > > >
