Author: dmeil
Date: Thu Feb 9 18:09:45 2012
New Revision: 1242427
URL: http://svn.apache.org/viewvc?rev=1242427&view=rev
Log:
hbase-5365. book - Arch/Region/Store adding description of compaction file
selection
Modified:
hbase/trunk/src/docbkx/book.xml
hbase/trunk/src/docbkx/configuration.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL:
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1242427&r1=1242426&r2=1242427&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Thu Feb 9 18:09:45 2012
@@ -283,7 +283,8 @@ try {
<para>HBase does not modify data in place, and so deletes are handled
by creating new markers called <emphasis>tombstones</emphasis>.
These tombstones, along with the dead values, are cleaned up on major
compactions.
</para>
- <para>See <xref linkend="version.delete"/> for more information on
deleting versions of columns.
+ <para>See <xref linkend="version.delete"/> for more information on
deleting versions of columns, and see
+ <xref linkend="compaction"/> for more information on compactions.
</para>
</section>
@@ -588,10 +589,10 @@ admin.enableTable(table);
HBase currently does not do well with anything above two or three column
families so keep the number
of column families in your schema low. Currently, flushing and
compactions are done on a per Region basis so
if one column family is carrying the bulk of the data bringing on
flushes, the adjacent families
- will also be flushed though the amount of data they carry is small.
Compaction is currently triggered
- by the total number of files under a column family. Its not size based.
When many column families the
+ will also be flushed though the amount of data they carry is small.
When many column families the
flushing and compaction interaction can make for a bunch of needless i/o
loading (To be addressed by
- changing flushing and compaction to work on a per column family basis).
+ changing flushing and compaction to work on a per column family basis).
For more information
+ on compactions, see <xref linkend="compaction"/>.
</para>
<para>Try to make do with one column family if you can in your schemas.
Only introduce a
second and third column family in the case where data access is
usually column scoped;
@@ -2136,16 +2137,133 @@ myHtd.setValue(HTableDescriptor.SPLIT_PO
<section xml:id="compaction">
<title>Compaction</title>
<para>There are two types of compactions: minor and major. Minor
compactions will usually pick up a couple of the smaller adjacent
- files and rewrite them as one. Minors do not drop deletes or expired
cells, only major compactions do this. Sometimes a minor compaction
- will pick up all the files in the store and in this case it actually
promotes itself to being a major compaction.
- For a description of how a minor compaction picks files to compact,
see the <link
xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii
diagram in the Store source code.</link>
+ StoreFiles and rewrite them as one. Minors do not drop deletes or
expired cells, only major compactions do this. Sometimes a minor compaction
+ will pick up all the StoreFiles in the Store and in this case it
actually promotes itself to being a major compaction.
</para>
- <para>After a major compaction runs there will be a single storefile
per store, and this will help performance usually. Caution: major compactions
rewrite all of the stores data and on a loaded system, this may not be tenable;
+ <para>After a major compaction runs there will be a single StoreFile
per Store, and this will help performance usually. Caution: major compactions
rewrite all of the Stores data and on a loaded system, this may not be tenable;
major compactions will usually have to be done manually on large
systems. See <xref linkend="managed.compactions" />.
</para>
<para>Compactions will <emphasis>not</emphasis> perform region merges.
See <xref linkend="ops.regionmgt.merge"/> for more information on region
merging.
</para>
- </section>
+ <section xml:id="compaction.file.selection">
+ <title>Compaction File Selection</title>
+ <para>To understand the core algorithm for StoreFile selection,
there is some ASCII-art in the <link
xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">Store
source code</link> that
+ will serve as useful reference. It has been copied below:
+<programlisting>
+/* normal skew:
+ *
+ * older ----> newer
+ * _
+ * | | _
+ * | | | | _
+ * --|-|- |-|- |-|---_-------_------- minCompactSize
+ * | | | | | | | | _ | |
+ * | | | | | | | | | | | |
+ * | | | | | | | | | | | |
+ */
+</programlisting>
+ Important knobs:
+ <itemizedlist>
+ <listitem><code>hbase.store.compaction.ratio</code> Ratio used in
compaction
+ file selection algorithm. (default 1.2F) </listitem>
+ <listitem><code>hbase.hstore.compaction.min</code> (.90
hbase.hstore.compactionThreshold) (files) Minimum number
+ of StoreFiles per Store to be selected for a compaction to
occur.</listitem>
+ <listitem><code>hbase.hstore.compaction.max</code> (files) Maximum
number of StoreFiles to compact per minor compaction.</listitem>
+ <listitem><code>hbase.hstore.compaction.min.size</code> (bytes)
+ Any StoreFile smaller than this setting with automatically be a
candidate for compaction. Defaults to
+ regions' memstore flush size (134 mb). </listitem>
+ <listitem><code>hbase.hstore.compaction.max.size</code> (.92)
(bytes)
+ Any StoreFile larger than this setting with automatically be
excluded from compaction. </listitem>
+ </itemizedlist>
+ </para>
+ <para>The minor compaction StoreFile selection logic is size based,
and selects a file for compaction when the file
+ <= sum(smaller_files) *
<code>hbase.hstore.compaction.ratio</code>.
+ </para>
+ </section>
+ <section xml:id="compaction.file.selection.example1">
+ <title>Minor Compaction File Selection - Example #1 (Basic
Example)</title>
+ <para>This example mirrors an example from the unit test
<code>TestCompactSelection</code>.
+ <itemizedlist>
+ <listitem><code>hbase.store.compaction.ratio</code> = 1.0F
</listitem>
+ <listitem><code>hbase.hstore.compaction.min</code> = 3 (files)
</listitem>>
+ <listitem><code>hbase.hstore.compaction.max</code> = 5 (files)
</listitem>>
+ <listitem><code>hbase.hstore.compaction.min.size</code> = 10
(bytes) </listitem>>
+ <listitem><code>hbase.hstore.compaction.max.size</code> = 1000
(bytes) </listitem>>
+ </itemizedlist>
+ The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece
(oldest to newest).
+ With the above parameters, the files that would be selected for
minor compaction are 23, 12, and 12.
+ </para>
+ <para>Why?
+ <itemizedlist>
+ <listitem>100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
</listitem>
+ <listitem>50 --> No, because sum(23, 12, 12) * 1.0 = 47.
</listitem>
+ <listitem>23 --> Yes, because sum(12, 12) * 1.0 = 24.
</listitem>
+ <listitem>12 --> Yes, because sum(12) * 1.0 = 12. </listitem>
+ <listitem>12 --> Yes, because the previous file had been
included, and this is included because this
+ does not exceed the the max-file limit of 5.</listitem>
+ </itemizedlist>
+ </para>
+ </section>
+ <section xml:id="compaction.file.selection.example2">
+ <title>Minor Compaction File Selection - Example #2 (Not Enough
Files To Compact)</title>
+ <para>This example mirrors an example from the unit test
<code>TestCompactSelection</code>.
+ <itemizedlist>
+ <listitem><code>hbase.store.compaction.ratio</code> = 1.0F
</listitem>
+ <listitem><code>hbase.hstore.compaction.min</code> = 3 (files)
</listitem>>
+ <listitem><code>hbase.hstore.compaction.max</code> = 5 (files)
</listitem>>
+ <listitem><code>hbase.hstore.compaction.min.size</code> = 10
(bytes) </listitem>>
+ <listitem><code>hbase.hstore.compaction.max.size</code> = 1000
(bytes) </listitem>>
+ </itemizedlist>
+ </para>
+ <para>The following StoreFiles exist: 100, 25, 12, and 12 bytes
apiece (oldest to newest).
+ With the above parameters, the files that would be selected for
minor compaction are 23, 12, and 12.
+ </para>
+ <para>Why?
+ <itemizedlist>
+ <listitem>100 --> No, because sum(25, 12, 12) * 1.0 =
47</listitem>
+ <listitem>25 --> No, because sum(12, 12) * 1.0 = 24</listitem>
+ <listitem>12 --> No. Candidate because sum(12) * 1.0 = 12,
there are only 2 files to compact and that is less than the threshold of
3</listitem>
+ <listitem>12 --> No. Candidate because the previous StoreFile
was, but there are not enough files to compact</listitem>
+ </itemizedlist>
+ </para>
+ </section>
+ <section xml:id="compaction.file.selection.example2">
+ <title>Minor Compaction File Selection - Example #3 (Limiting Files
To Compact)</title>
+ <para>This example mirrors an example from the unit test
<code>TestCompactSelection</code>.
+ <itemizedlist>
+ <listitem><code>hbase.store.compaction.ratio</code> = 1.0F
</listitem>
+ <listitem><code>hbase.hstore.compaction.min</code> = 3 (files)
</listitem>>
+ <listitem><code>hbase.hstore.compaction.max</code> = 5 (files)
</listitem>>
+ <listitem><code>hbase.hstore.compaction.min.size</code> = 10
(bytes) </listitem>>
+ <listitem><code>hbase.hstore.compaction.max.size</code> = 1000
(bytes) </listitem>>
+ </itemizedlist>
+ The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece
(oldest to newest).
+ With the above parameters, the files that would be selected for
minor compaction are 7, 6, 5, 4, 3.
+ </para>
+ <para>Why?
+ <itemizedlist>
+ <listitem>7 --> Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21.
Also, 7 is less than the min-size</listitem>
+ <listitem>6 --> Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15.
Also, 6 is less than the min-size. </listitem>
+ <listitem>5 --> Yes, because sum(4, 3, 2, 1) * 1.0 = 10.
Also, 5 is less than the min-size. </listitem>
+ <listitem>4 --> Yes, because sum(3, 2, 1) * 1.0 = 6. Also, 4
is less than the min-size. </listitem>
+ <listitem>3 --> Yes, because sum(2, 1) * 1.0 = 3. Also, 3 is
less than the min-size. </listitem>
+ <listitem>2 --> No. Also, 2 is less than the min-size, the
max-number of files to compact has been reached. </listitem>
+ <listitem>1 --> No. Also, 1 is less than the min-size, the
max-number of files to compact has been reached. </listitem>
+ </itemizedlist>
+ </para>
+ </section>
+ <section xml:id="compaction.config.impact">
+ <title>Impact of Key Configuration Options</title>
+ <para><code>hbase.store.compaction.ratio</code>. A large ratio
(e.g., 10F) will produce a single giant file. Conversely, a value of .25F will
+ produce behavior similar to the BigTable compaction algorithm -
resulting in 4 StoreFiles.
+ </para>
+ <para><code>hbase.hstore.compaction.min.size</code>. This defaults
to <code>hbase.hregion.memstore.flush.size</code> (134 mb). Because
+ this limit represents the "automatic include" limit for all
StoreFiles smaller than this value, this value may need to
+ be adjusted downwards in write-heavy environments where many 1 or 2
mb StoreFiles are being flushed, because every file
+ will be targeted for compaction, and the resulting files may still
be under the min-size and require further compaction, etc.
+ </para>
+ </section>
+ </section> <!-- compaction -->
</section> <!-- store -->
Modified: hbase/trunk/src/docbkx/configuration.xml
URL:
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/configuration.xml?rev=1242427&r1=1242426&r2=1242427&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/configuration.xml (original)
+++ hbase/trunk/src/docbkx/configuration.xml Thu Feb 9 18:09:45 2012
@@ -1569,6 +1569,7 @@ of all regions.
they occur. They can be administered through the HBase shell, or via
<link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
</para>
+ <para>For more information about compactions and the compaction file
selection process, see <xref linkend="compaction"/></para>
</section>
</section>