We should do better at scheduling major compactions over a longer period of 
time if we keep it as a background process.

Also, there's been some discussion about adding some heuristics about never 
major compacting very old and/or very large HFiles to prevent old, rarely read 
data from being rewritten constantly.

> -----Original Message-----
> From: Ryan Rawson [mailto:ryano...@gmail.com]
> Sent: Monday, May 17, 2010 12:02 PM
> To: hbase-user@hadoop.apache.org
> Cc: Joel Koshy
> Subject: Re: Additional disk space required for Hbase compactions..
> 
> Yes compactions happen on hdfs. Hbase will only compact one region at a
> time
> per regionservers so you in theory will need k×max(all region sizes).
> 
> But hdfs does a delayed delete, so deleted files are not instantly
> freed up.
> You could end up requiring much more disk space.
> 
> Considering hdfs disk should be the cheapest (data drives in a low
> density
> configuration) disks you own hopefully it wont be hard to over
> provision.
> 
> On May 17, 2010 11:57 AM, "Vidhyashankar Venkataraman" <
> vidhy...@yahoo-inc.com> wrote:
> 
> Hi guys,
>  I am quite new to Hbase.. I am trying to figure out the max additional
> disk
> space required for compactions..
> 
>  If the set of small Hfiles amount to a size of U in total, before a
> major
> compaction happens and the 'behemoth' HFile has size M, assuming the
> resultant size of the Hfile after compaction is U+M (worst case has
> only
> insertions) and a replication factor of r, then disk space taken by the
> Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is based on
> my
> understanding that compactions happen on HDFS and not on the local file
> system: am I correct?)...
> 
> Thank you
> Vidhya

Reply via email to