>> I'm not sure I understand why you distinguish small HFiles and a single 
>> behemoth HFile?  Are you trying to understand more
>> about disk space or I/O patterns?
Was talking wrt an application I had in mind.. Right now, I am considering just 
disk space..

Ryan's comment:
>> Yes compactions happen on hdfs. Hbase will only compact one region at a time
>> per regionservers so you in theory will need k?max(all region sizes).
So the U and M from my mail are sizes per region. Am I right? So what is a good 
cutoff region size for hundreds of TB of data to be stored in hbase? I am 
wondering if this has ever been attempted..

Vidhya


> -----Original Message-----
> From: Vidhyashankar Venkataraman [mailto:vidhy...@yahoo-inc.com]
> Sent: Monday, May 17, 2010 11:56 AM
> To: hbase-user@hadoop.apache.org
> Cc: Joel Koshy
> Subject: Additional disk space required for Hbase compactions..
>
> Hi guys,
>   I am quite new to Hbase.. I am trying to figure out the max
> additional disk space required for compactions..
>
>   If the set of small Hfiles amount to a size of U in total, before a
> major compaction happens and the 'behemoth' HFile has size M, assuming
> the resultant size of the Hfile after compaction is U+M (worst case has
> only insertions) and a replication factor of r, then disk space taken
> by the Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is
> based on my understanding that compactions happen on HDFS and not on
> the local file system: am I correct?)...
>
> Thank you
> Vidhya
>


Reply via email to