busbey commented on a change in pull request #1232: HBASE-23198 Update ref 
guide for distributed MOB compaction.
URL: https://github.com/apache/hbase/pull/1232#discussion_r387919494
 
 

 ##########
 File path: src/main/asciidoc/_chapters/hbase_mob.adoc
 ##########
 @@ -79,63 +72,212 @@ hcd.setMobThreshold(102400L);
 ----
 ====
 
-=== Configure MOB Compaction Policy
+=== Testing MOB
+
+The utility `org.apache.hadoop.hbase.IntegrationTestIngestWithMOB` is provided 
to assist with testing
+the MOB feature. The utility is run as follows:
+[source,bash]
+----
+$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestWithMOB \
+            -threshold 1024 \
+            -minMobDataSize 512 \
+            -maxMobDataSize 5120
+----
+
+* `*threshold*` is the threshold at which cells are considered to be MOBs.
+   The default is 1 kB, expressed in bytes.
+* `*minMobDataSize*` is the minimum value for the size of MOB data.
+   The default is 512 B, expressed in bytes.
+* `*maxMobDataSize*` is the maximum value for the size of MOB data.
+   The default is 5 kB, expressed in bytes.
+
+=== MOB architecture
+
+==== Overview
+This section is derived from information found in
+link:https://issues.apache.org/jira/browse/HBASE-11339[HBASE-11339]. For more 
information see
+the last version of the design doc created during that work:
+"link:https://github.com/apache/hbase/blob/master/dev-support/design-docs/HBASE-11339%20MOB%20GA%20design.pdf[HBASE-11339
 MOB GA design.pdf]".
+
+The MOB feature reduces the overall IO load for configured column families by 
storing values that
+are larger than the configured threshold outside of the normal regions to 
avoid splits, merges, and
+most importantly normal compactions.
+
+When a cell is first written to a region it is stored in the WAL and memstore 
regardless of value
+size. When memstores from a column family configured to use MOB are eventually 
flushed two hfiles
+are written simultaneously. Cells with a value smaller than the threshold size 
are written to a
+normal region hfile. Cells with a value larger than the threshold are written 
into a special MOB
+hfile and also have a MOB reference cell written into the normal region HFile. 
As the Region Server
+flushes a MOB enabled memstore and closes a given normal region HFile it 
appends metadata that lists
+each of the special MOB hfiles referenced by the cells within.
+
+MOB reference cells have the same key as the cell they are based on. The value 
of the reference cell
+is made up of two pieces of metadata: the size of the actual value and the MOB 
hfile that contains
+the original cell. In addition to any tags originally written to HBase, the 
reference cell prepends
+two additional tags. The first is a marker tag that says the cell is a MOB 
reference. This can be
+used later to scan specifically just for reference cells. The second stores 
the namespace and table
+at the time the MOB hfile is written out. This tag is used to optimize how the 
MOB system finds
+the underlying value in MOB hfiles after a series of HBase snapshot operations 
(ref HBASE-12332).
+Note that tags are only available within HBase servers and by default are not 
sent over RPCs.
+
+All MOB hfiles for a given table are managed within a logical region that does 
not directly serve
+requests. When these MOB hfiles are created from a flush or MOB compaction 
they are placed in a
+dedicated mob data area under the hbase root directory specific to the 
namespace, table, mob
+logical region, and column family. In general that means a path structured 
like:
 
-By default, MOB files for one specific day are compacted into one large MOB 
file.
-To reduce MOB file count more, there are other MOB Compaction policies 
supported.
+----
+%HBase Root Dir%/mobdir/data/%namespace%/%table%/%logical region%/%column 
family%/
+----
 
-daily policy  - compact MOB Files for one day into one large MOB file (default 
policy)
-weekly policy - compact MOB Files for one week into one large MOB file
-montly policy - compact MOB Files for one  month into one large MOB File
+With default configs, an example table named 'some_table' in the
+default namespace with a MOB enabled column family named 'foo' this HDFS 
directory would be
 
-.Configure MOB compaction policy Using HBase Shell
 ----
-hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'daily'}
-hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'weekly'}
-hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'monthly'}
+/hbase/mobdir/data/default/some_table/372c1b27e3dc0b56c3a031926e5efbe9/foo/
+----
+
+These MOB hfiles are maintained by special chores in the HBase Master and 
across the individual
+Region Servers. Specifically those chores take care of enforcing TTLs and 
compacting them. Note that
+this compaction is primarily a matter of controlling the total number of files 
in HDFS because our
+operational assumptions for MOB data is that it will seldom update or delete.
+
+When a given MOB hfile is no longer needed as a result of our compaction 
process then a chore in
+the Master will take care of moving it to the archive just
+like any normal hfile. Because the table's mob region is independent of all 
the normal regions it
+can coexist with them in the regular archive storage area:
 
-hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'daily'}
-hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'weekly'}
-hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'monthly'}
+----
+/hbase/archive/data/default/some_table/372c1b27e3dc0b56c3a031926e5efbe9/foo/
 ----
 
-=== Configure MOB Compaction mergeable threshold
+The same hfile cleaning chores that take care of eventually deleting unneeded 
archived files from
+normal regions thus also will take care of these MOB hfiles.
 
-If the size of a mob file is less than this value, it's regarded as a small 
file and needs to
-be merged in mob compaction. The default value is 1280MB.
+==== MOB compaction
+
+Each time the memstore for a MOB enabled column family performs a flush HBase 
will write values over
+the MOB threshold into MOB specific hfiles. When normal region compaction 
occurs the Region Server
+rewrites the normal data files while maintaining references to these MOB files 
without rewriting
+them. Normal client lookups for MOB values transparently will receive the 
original values because
+the Region Server internals take care of using the reference data to then pull 
the value out of a
+specific MOB file. This indirection means that building up a large number of 
MOB hfiles doesn't
+impact the overall time to retrieve any specific MOB cell. Thus, we need not 
perform compactions of
+the MOB hfiles nearly as often as normal hfiles. As a result, HBase saves IO 
by not rewriting MOB
+hfiles as a part of the periodic compactions a Region Server does on its own.
+
+However, if deletes and updates of MOB cells are frequent then this 
indirection will begin to waste
+space. The only way to stop using the space of a particular MOB hfile is to 
ensure no cells still
+hold references to it. To do that we need to ensure we have written the 
current values into a new
+MOB hfile. If our backing filesystem has a limitation on the number of files 
that can be present, as
+HDFS does, then even if we do not have deletes or updates of MOB cells 
eventually there will be a
+sufficient number of MOB hfiles that we will need to coallesce them.
+
+Periodically a chore in the master coordinates having the region servers
+perform a special major compaction that also handles rewritting new MOB files. 
Because this
+rewriting has the advantage of looking across all active cells for the region 
our several small MOB
+files should end up as a single MOB file per region. The chore defaults to 
running weekly and can be
+configured by setting `hbase.mob.compaction.chore.period` to the desired 
period in seconds.
 
 ====
 [source,xml]
 ----
 <property>
-    <name>hbase.mob.compaction.mergeable.threshold</name>
-    <value>10000000000</value>
+    <name>hbase.mob.compaction.chore.period</name>
+    <value>2592000</value>
+    <description>Example of changing the chore period from a week to a 
month.</description>
 </property>
 ----
 ====
 
-=== Testing MOB
+By default, the periodic MOB compaction coordination chore will attempt to 
keep every region
+busy doing compactions in parallel in order to maximize the amount of work 
done on the cluster.
+If you need to tune the amount of IO this compaction generates on the 
underlying filesystem, you
+can control how many concurrent region-level compaction requests are allowed 
by setting
+`hbase.mob.major.compaction.region.batch.size` to an integer number greater 
than zero. If you set
+the configuration to 0 then you will get the default behavior of attempting to 
do all regions in
+parallel.
 
-The utility `org.apache.hadoop.hbase.IntegrationTestIngestWithMOB` is provided 
to assist with testing
-the MOB feature. The utility is run as follows:
-[source,bash]
+====
+[source,xml]
 ----
-$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestWithMOB \
-            -threshold 1024 \
-            -minMobDataSize 512 \
-            -maxMobDataSize 5120
+<property>
+    <name>hbase.mob.major.compaction.region.batch.size</name>
+    <value>1</value>
+    <description>Example of switching from "as parallel as possible" to 
"serially"</description>
+</property>
 ----
+====
 
-* `*threshold*` is the threshold at which cells are considered to be MOBs.
-   The default is 1 kB, expressed in bytes.
-* `*minMobDataSize*` is the minimum value for the size of MOB data.
-   The default is 512 B, expressed in bytes.
-* `*maxMobDataSize*` is the maximum value for the size of MOB data.
-   The default is 5 kB, expressed in bytes.
+==== MOB file archiving
+
+Eventually we will have MOB hfiles that are no longer needed. Either clients 
will overwrite the
+value or a MOB-rewriting compaction will store a reference to a newer larger 
MOB hfile. Because any
+given MOB cell could have originally been written either in the current region 
or in a parent region
+that existed at some prior point in time, individual Region Servers do not 
decide when it is time
+to archive MOB hfiles. Instead a periodic chore in the Master evaluates MOB 
hfiles for archiving.
+
+A MOB HFile will be subject to archiving under any of the following conditions:
+
+* Any MOB HFile older than the column family's TTL
+* Any MOB HFile older than a "too recent" threshold with no references to it 
from the regular hfiles
+  for all regions in a column family
+
+To determine if a MOB HFile meets the second criteria the chore extracts 
metadata from the regular
+HFiles for each MOB enabled column family for a given table. That metadata 
enumerates the complete
+set of MOB HFiles needed to satisfy the references stored in the normal HFile 
area.
+
+The period of the cleaner chore can be configued by setting 
`hbase.master.mob.cleaner.period` to a
+positive integer number of seconds. It defaults to running daily. You should 
not need to tune it
+unless you have a very aggressive TTL or a very high rate of MOB updates with 
a correspondingly
+high rate of non-MOB compactions.
+
+=== MOB Optimization Tasks
+
+==== Further limiting write amplification
+
+If your MOB workload has few to no updates or deletes then you can opt-in to 
MOB compactions that
+optimize for limiting the amount of write amplification. It acheives this by 
setting a
+size threshold to ignore MOB files during the compaction process. When a given 
region goes
+through MOB compaction it will evaluate the size of the MOB file that 
currently holds the actual
+value and skip rewriting the value if that file is over threshold.
+
+The bound of write amplification in this mode can be approximated as
+stem:["Write Amplification" = log_K(M/S)] where *K* is the number of files in 
compaction
+selection, *M* is the configurable threshold for MOB files size, and *S* is 
the minmum size of
+memstore flushes that create MOB files in the first place. For example given 5 
files picked up per
+compaction, a threshold of 1 GB, and a flush size of 10MB the write 
amplification will be
+stem:[log_5((1GB)/(10MB)) = log_5(100) = 2.86].
+
+If we are using an underlying filesystem with a limitation on the number of 
files, such as HDFS,
+and we know our expected data set size we can choose our maximum file size in 
order to approach
+this limit but stay within it in order to minimize write amplification. For 
example, if we expect to
+store a petabyte and we have a conservative limitation of a million files in 
our HDFS instance, then
+stem:[(1PB)/(1M) = 1GB] gives us a target limitation of a gigabyte per MOB 
file.
+
 
 Review comment:
   the compactor for MOB enabled column families still uses the throughput 
controllers, so yes the PressureAwareCompactionThroughputController could be 
used. However, for cells over the mob size threshold the compactor for MOB 
enabled column families only tells the throughput controller about the size of 
the reference cell. is that worth calling out in the section on mob compaction?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to