We should do better at scheduling major compactions over a longer period of time if we keep it as a background process.
Also, there's been some discussion about adding some heuristics about never major compacting very old and/or very large HFiles to prevent old, rarely read data from being rewritten constantly. > -----Original Message----- > From: Ryan Rawson [mailto:ryano...@gmail.com] > Sent: Monday, May 17, 2010 12:02 PM > To: hbase-user@hadoop.apache.org > Cc: Joel Koshy > Subject: Re: Additional disk space required for Hbase compactions.. > > Yes compactions happen on hdfs. Hbase will only compact one region at a > time > per regionservers so you in theory will need k×max(all region sizes). > > But hdfs does a delayed delete, so deleted files are not instantly > freed up. > You could end up requiring much more disk space. > > Considering hdfs disk should be the cheapest (data drives in a low > density > configuration) disks you own hopefully it wont be hard to over > provision. > > On May 17, 2010 11:57 AM, "Vidhyashankar Venkataraman" < > vidhy...@yahoo-inc.com> wrote: > > Hi guys, > I am quite new to Hbase.. I am trying to figure out the max additional > disk > space required for compactions.. > > If the set of small Hfiles amount to a size of U in total, before a > major > compaction happens and the 'behemoth' HFile has size M, assuming the > resultant size of the Hfile after compaction is U+M (worst case has > only > insertions) and a replication factor of r, then disk space taken by the > Hfiles is 2r(U+M).. Is this estimate reasonable? (This also is based on > my > understanding that compactions happen on HDFS and not on the local file > system: am I correct?)... > > Thank you > Vidhya