>> I'm not sure I understand why you distinguish small HFiles and a single >> behemoth HFile? Are you trying to understand more >> about disk space or I/O patterns? Was talking wrt an application I had in mind.. Right now, I am considering just disk space..
Ryan's comment: >> Yes compactions happen on hdfs. Hbase will only compact one region at a time >> per regionservers so you in theory will need k?max(all region sizes). So the U and M from my mail are sizes per region. Am I right? So what is a good cutoff region size for hundreds of TB of data to be stored in hbase? I am wondering if this has ever been attempted.. Vidhya > -----Original Message----- > From: Vidhyashankar Venkataraman [mailto:vidhy...@yahoo-inc.com] > Sent: Monday, May 17, 2010 11:56 AM > To: hbase-user@hadoop.apache.org > Cc: Joel Koshy > Subject: Additional disk space required for Hbase compactions.. > > Hi guys, > I am quite new to Hbase.. I am trying to figure out the max > additional disk space required for compactions.. > > If the set of small Hfiles amount to a size of U in total, before a > major compaction happens and the 'behemoth' HFile has size M, assuming > the resultant size of the Hfile after compaction is U+M (worst case has > only insertions) and a replication factor of r, then disk space taken > by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is > based on my understanding that compactions happen on HDFS and not on > the local file system: am I correct?)... > > Thank you > Vidhya >