Author: jdcryans
Date: Thu Jan  5 01:35:37 2012
New Revision: 1227425

URL: http://svn.apache.org/viewvc?rev=1227425&view=rev
Log:
HBASE-5127  [ref manual] Better block cache documentation

Modified:
    hbase/trunk/src/docbkx/book.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: 
http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1227425&r1=1227424&r2=1227425&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Thu Jan  5 01:35:37 2012
@@ -1607,15 +1607,83 @@ scan.setFilter(filter);
 
      <section xml:id="block.cache">
        <title>Block Cache</title>
-       <para>The Block Cache contains three levels of block priority to allow 
for scan-resistance and in-memory ColumnFamilies.  A block is added with an 
in-memory
-       flag if the containing ColumnFamily is defined in-memory, otherwise a 
block becomes a single access priority.  Once a block is accessed again, it 
changes to multiple access. 
-       This is used to prevent scans from thrashing the cache, adding a 
least-frequently-used element to the eviction algorithm.  Blocks from in-memory 
ColumnFamilies
-       are the last to be evicted.
-       </para>
-       <para>
+       <section xml:id="block.cache.design">
+        <title>Design</title>
+        <para>The Block Cache is an LRU cache that contains three levels of 
block priority to allow for scan-resistance and in-memory ColumnFamilies:
+        </para>
+        <itemizedlist>
+            <listitem>Single access priority: The first time a block is loaded 
from HDFS it normally has this priority and it will be part of the first group 
to be considered
+            during evictions. The advantage is that scanned blocks are more 
likely to get evicted than blocks that are getting more usage.
+            </listitem>
+            <listitem>Mutli access priority: If a block in the previous 
priority group is accessed again, it upgrades to this priority. It is thus part 
of the second group
+            considered during evictions.
+            </listitem>
+            <listitem>In-memory access priority: If the block's family was 
configured to be "in-memory", it will be part of this priority disregarding the 
number of times it
+            was accessed. Catalog tables are configured like this. This group 
is the last one considered during evictions.
+            </listitem>
+        </itemizedlist>
+        <para>
         For more information, see the <link 
xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/LruBlockCache.html";>LruBlockCache
 source</link>
         </para>
-     </section>
+       </section>
+     <section xml:id="block.cache.usage">
+        <title>Usage</title>
+        <para>Block caching is enabled by default for all the user tables 
which means that any read operation will load the LRU cache. This might be good 
for a large number of use cases,
+        but further tunings are usually required in order to achieve better 
performance. An important concept is the
+        <link 
xlink:href="http://en.wikipedia.org/wiki/Working_set_size";>working set 
size</link>, or WSS, which is: "the amount of memory needed to compute the 
answer to a problem".
+        For a website, this would be the data that's needed to answer the 
queries over a short amount of time.
+        </para>
+        <para>The way to calculate how much memory is available in HBase for 
caching is:
+        </para>
+        <programlisting>
+            number of region servers * heap size * hfile.block.cache.size * 
0.85
+        </programlisting>
+        <para>The default value for the block cache is 0.25 which represents 
25% of the available heap. The last value (85%) is the default acceptable 
loading factor in the LRU cache after
+        which eviction is started. The reason it is included in this equation 
is that it would be unrealistic to say that it is possible to use 100% of the 
available memory since this would
+        make the process blocking from the point where it loads new blocks. 
Here are some examples:
+        </para>
+        <itemizedlist>
+            <listitem>One region server with the default heap size (1GB) and 
the default block cache size will have 217MB of block cache available.
+            </listitem>
+            <listitem>20 region servers with the heap size set to 8GB and a 
default block cache size will have 34GB of block cache.
+            </listitem>
+            <listitem>100 region servers with the heap size set to 24GB and a 
block cache size of 0.5 will have about 1TB of block cache.
+            </listitem>
+        </itemizedlist>
+        <para>Your data isn't the only resident of the block cache, here are 
others that you may have to take into account:
+        </para>
+        <itemizedlist>
+            <listitem>Catalog tables: The -ROOT- and .META. tables are forced 
into the block cache and have the in-memory priority which means that they are 
harder to evict. The former never uses
+            more than a few hundreds of bytes while the latter can occupy a 
few MBs (depending on the number of regions).
+            </listitem>
+            <listitem>HFiles indexes: HFile is the file format that HBase uses 
to store data in HDFS and it contains a multi-layered index in order seek to 
the data without having to read the whole file.
+            The size of those indexes is a factor of the block size (64KB by 
default), the size of your keys and the amount of data you are storing. For big 
data sets it's not unusual to see numbers around
+            1GB per region server, although not all of it will be in cache 
because the LRU will evict indexes that aren't used.
+            </listitem>
+            <listitem>Keys: Taking into account only the values that are being 
stored is missing half the picture since every value is stored along with its 
keys
+            (row key, family, qualifier, and timestamp). See <xref 
linkend="keysize"/>.
+            </listitem>
+            <listitem>Bloom filters: Just like the HFile indexes, those data 
structures (when enabled) are stored in the LRU.
+            </listitem>
+            </itemizedlist>
+        <para>Currently the recommended way to measure HFile indexes and bloom 
filters sizes is to look at the region server web UI and checkout the relevant 
metrics. For keys,
+        sampling can be done by using the HFile command line tool and look for 
the average key size metric.
+        </para>
+        <para>It's generally bad to use block caching when the WSS doesn't fit 
in memory. This is the case when you have for example 40GB available across all 
your region servers' block caches
+        but you need to process 1TB of data. One of the reasons is that the 
churn generated by the evictions will trigger more garbage collections 
unnecessarily. Here are two use cases:
+        </para>
+        <itemizedlist>
+            <listitem>Fully random reading pattern: This is a case where you 
almost never access the same row twice within a short amount of time such that 
the chance of hitting a cached block is close
+            to 0. Setting block caching on such a table is a waste of memory 
and CPU cycles, more so that it will generate more garbage to pick up by the 
JVM. For more information on monitoring GC,
+            see <xref linkend="trouble.log.gc"/>.
+            </listitem>
+            <listitem>Mapping a table: In a typical MapReduce job that takes a 
table in input, every row will be read only once so there's no need to put them 
into the block cache. The Scan object has
+            the option of turning this off via the setCaching method (set it 
to false). You can still keep block caching turned on on this table if you need 
fast random read access. An example would be
+            counting the number of rows in a table that serves live traffic, 
caching every block of that table would create massive churn and would surely 
evict data that's currently in use.
+            </listitem>
+        </itemizedlist>
+      </section>
+      </section>
 
       <section xml:id="wal">
        <title >Write Ahead Log (WAL)</title>


Reply via email to