[07/34] hbase git commit: HBASE-12918 Backport asciidoc changes (apurtell and enis)

enis Tue, 27 Jan 2015 19:32:09 -0800

http://git-wip-us.apache.org/repos/asf/hbase/blob/cb77a925/src/main/docbkx/rpc.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/rpc.xml b/src/main/docbkx/rpc.xml
deleted file mode 100644
index 2e5dd5f..0000000
--- a/src/main/docbkx/rpc.xml
+++ /dev/null
@@ -1,301 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<appendix
-    xml:id="hbase.rpc"
-    version="5.0"
-    xmlns="http://docbook.org/ns/docbook";
-    xmlns:xlink="http://www.w3.org/1999/xlink";
-    xmlns:xi="http://www.w3.org/2001/XInclude";
-    xmlns:svg="http://www.w3.org/2000/svg";
-    xmlns:m="http://www.w3.org/1998/Math/MathML";
-    xmlns:html="http://www.w3.org/1999/xhtml";
-    xmlns:db="http://docbook.org/ns/docbook";>
-    <!--/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-
-    <title>0.95 RPC Specification</title>
-    <para>In 0.95, all client/server communication is done with <link
-            
xlink:href="https://code.google.com/p/protobuf/";>protobufâed</link> Messages 
rather than
-        with <link
-            
xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html";>Hadoop
-            Writables</link>. Our RPC wire format therefore changes. This 
document describes the
-        client/server request/response protocol and our new RPC 
wire-format.</para>
-    <para />
-    <para>For what RPC is like in 0.94 and previous, see BenoÃ®t/Tsunaâs 
<link
-            
xlink:href="https://github.com/OpenTSDB/asynchbase/blob/master/src/HBaseRpc.java#L164";>Unofficial
-            Hadoop / HBase RPC protocol documentation</link>. For more 
background on how we arrived
-        at this spec., see <link
-            
xlink:href="https://docs.google.com/document/d/1WCKwgaLDqBw2vpux0jPsAu2WPTRISob7HGCO8YhfDTA/edit#";>HBase
-            RPC: WIP</link></para>
-    <para />
-    <section>
-        <title>Goals</title>
-        <para>
-            <orderedlist>
-                <listitem>
-                    <para>A wire-format we can evolve</para>
-                </listitem>
-                <listitem>
-                    <para>A format that does not require our rewriting server 
core or radically
-                        changing its current architecture (for later).</para>
-                </listitem>
-            </orderedlist>
-        </para>
-    </section>
-    <section>
-        <title>TODO</title>
-        <para>
-            <orderedlist>
-                <listitem>
-                    <para>List of problems with currently specified format and 
where we would like
-                        to go in a version2, etc. For example, what would we 
have to change if
-                        anything to move server async or to support 
streaming/chunking?</para>
-                </listitem>
-                <listitem>
-                    <para>Diagram on how it works</para>
-                </listitem>
-                <listitem>
-                    <para>A grammar that succinctly describes the wire-format. 
Currently we have
-                        these words and the content of the rpc protobuf idl 
but a grammar for the
-                        back and forth would help with groking rpc. Also, a 
little state machine on
-                        client/server interactions would help with 
understanding (and ensuring
-                        correct implementation).</para>
-                </listitem>
-            </orderedlist>
-        </para>
-    </section>
-    <section>
-        <title>RPC</title>
-        <para>The client will send setup information on connection establish. 
Thereafter, the client
-            invokes methods against the remote server sending a protobuf 
Message and receiving a
-            protobuf Message in response. Communication is synchronous. All 
back and forth is
-            preceded by an int that has the total length of the 
request/response. Optionally,
-            Cells(KeyValues) can be passed outside of protobufs in 
follow-behind Cell blocks
-            (because <link
-                
xlink:href="https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#";>we
-                canât protobuf megabytes of KeyValues</link> or Cells). 
These CellBlocks are encoded
-            and optionally compressed.</para>
-        <para />
-        <para>For more detail on the protobufs involved, see the <link
-                
xlink:href="http://svn.apache.org/viewvc/hbase/trunk/hbase-protocol/src/main/protobuf/RPC.proto?view=markup";>RPC.proto</link>
-            file in trunk.</para>
-
-        <section>
-            <title>Connection Setup</title>
-            <para>Client initiates connection.</para>
-            <section>
-                <title>Client</title>
-                <para>On connection setup, client sends a preamble followed by 
a connection header. </para>
-
-                <section>
-                    <title>&lt;preamble&gt;</title>
-                    <programlisting>&lt;MAGIC 4 byte integer&gt; &lt;1 byte 
RPC Format Version&gt; &lt;1 byte auth type&gt;</programlisting>
-                    <para> We need the auth method spec. here so the 
connection header is encoded if auth enabled.</para>
-                    <para>E.g.: HBas0x000x50 -- 4 bytes of MAGIC -- âHBasâ 
-- plus one-byte of
-                        version, 0 in this case, and one byte, 0x50 (SIMPLE). 
of an auth
-                        type.</para>
-                </section>
-
-                <section>
-                    <title>&lt;Protobuf ConnectionHeader Message&gt;</title>
-                    <para>Has user info, and âprotocolâ, as well as the 
encoders and compression the
-                        client will use sending CellBlocks. CellBlock encoders 
and compressors are
-                        for the life of the connection. CellBlock encoders 
implement
-                        org.apache.hadoop.hbase.codec.Codec. CellBlocks may 
then also be compressed.
-                        Compressors implement 
org.apache.hadoop.io.compress.CompressionCodec. This
-                        protobuf is written using writeDelimited so is 
prefaced by a pb varint with
-                        its serialized length</para>
-                </section>
-            </section>
-            <!--Client-->
-
-            <section>
-                <title>Server</title>
-                <para>After client sends preamble and connection header, 
server does NOT respond if
-                    successful connection setup. No response means server is 
READY to accept
-                    requests and to give out response. If the version or 
authentication in the
-                    preamble is not agreeable or the server has trouble 
parsing the preamble, it
-                    will throw a 
org.apache.hadoop.hbase.ipc.FatalConnectionException explaining the
-                    error and will then disconnect. If the client in the 
connection header -- i.e.
-                    the protobufâd Message that comes after the connection 
preamble -- asks for for
-                    a Service the server does not support or a codec the 
server does not have, again
-                    we throw a FatalConnectionException with 
explanation.</para>
-            </section>
-        </section>
-
-        <section>
-            <title>Request</title>
-            <para>After a Connection has been set up, client makes requests. 
Server responds.</para>
-            <para>A request is made up of a protobuf RequestHeader followed by 
a protobuf Message
-                parameter. The header includes the method name and optionally, 
metadata on the
-                optional CellBlock that may be following. The parameter type 
suits the method being
-                invoked: i.e. if we are doing a getRegionInfo request, the 
protobuf Message param
-                will be an instance of GetRegionInfoRequest. The response will 
be a
-                GetRegionInfoResponse. The CellBlock is optionally used 
ferrying the bulk of the RPC
-                data: i.e Cells/KeyValues.</para>
-            <section>
-                <title>Request Parts</title>
-                <section>
-                    <title>&lt;Total Length&gt;</title>
-                    <para>The request is prefaced by an int that holds the 
total length of what
-                        follows.</para>
-                </section>
-                <section>
-                    <title>&lt;Protobuf RequestHeader Message&gt;</title>
-                    <para>Will have call.id, trace.id, and method name, etc. 
including optional
-                        Metadata on the Cell block IFF one is following. Data 
is protobufâd inline
-                        in this pb Message or optionally comes in the 
following CellBlock</para>
-                </section>
-                <section>
-                    <title>&lt;Protobuf Param Message&gt;</title>
-                    <para>If the method being invoked is getRegionInfo, if you 
study the Service
-                        descriptor for the client to regionserver protocol, 
you will find that the
-                        request sends a GetRegionInfoRequest protobuf Message 
param in this
-                        position.</para>
-                </section>
-                <section>
-                    <title>&lt;CellBlock&gt;</title>
-                    <para>An encoded and optionally compressed Cell 
block.</para>
-                </section>
-            </section>
-            <!--Request parts-->
-        </section>
-        <!--Request-->
-
-        <section>
-            <title>Response</title>
-            <para>Same as Request, it is a protobuf ResponseHeader followed by 
a protobuf Message
-                response where the Message response type suits the method 
invoked. Bulk of the data
-                may come in a following CellBlock.</para>
-            <section>
-                <title>Response Parts</title>
-                <section>
-                    <title>&lt;Total Length&gt;</title>
-                    <para>The response is prefaced by an int that holds the 
total length of what
-                        follows.</para>
-                </section>
-                <section>
-                    <title>&lt;Protobuf ResponseHeader Message&gt;</title>
-                    <para>Will have call.id, etc. Will include exception if 
failed processing.
-                        Â Optionally includes metadata on optional, IFF there 
is a CellBlock
-                        following.</para>
-                </section>
-
-                <section>
-                    <title>&lt;Protobuf Response Message&gt;</title>
-                    <para>Return or may be nothing if exception. If the method 
being invoked is
-                        getRegionInfo, if you study the Service descriptor for 
the client to
-                        regionserver protocol, you will find that the response 
sends a
-                        GetRegionInfoResponse protobuf Message param in this 
position.</para>
-                </section>
-                <section>
-                    <title>&lt;CellBlock&gt;</title>
-                    <para>An encoded and optionally compressed Cell 
block.</para>
-                </section>
-            </section>
-            <!--Parts-->
-        </section>
-        <!--Response-->
-
-        <section>
-            <title>Exceptions</title>
-            <para>There are two distinct types. There is the request failed 
which is encapsulated
-                inside the response header for the response. The connection 
stays open to receive
-                new requests. The second type, the FatalConnectionException, 
kills the
-                connection.</para>
-            <para>Exceptions can carry extra information. See the 
ExceptionResponse protobuf type.
-                It has a flag to indicate do-no-retry as well as other 
miscellaneous payload to help
-                improve client responsiveness.</para>
-        </section>
-        <section>
-            <title>CellBlocks</title>
-            <para>These are not versioned. Server can do the codec or it 
cannot. If new version of a
-                codec with say, tighter encoding, then give it a new class 
name. Codecs will live on
-                the server for all time so old clients can connect.</para>
-        </section>
-    </section>
-
-
-    <section>
-        <title>Notes</title>
-        <section>
-            <title>Constraints</title>
-            <para>In some part, current wire-format -- i.e. all requests and 
responses preceeded by
-                a length -- has been dictated by current server non-async 
architecture.</para>
-        </section>
-        <section>
-            <title>One fat pb request or header+param</title>
-            <para>We went with pb header followed by pb param making a request 
and a pb header
-                followed by pb response for now. Doing header+param rather 
than a single protobuf
-                Message with both header and param content:</para>
-            <para>
-                <orderedlist>
-                    <listitem>
-                        <para>Is closer to what we currently have</para>
-                    </listitem>
-                    <listitem>
-                        <para>Having a single fat pb requires extra copying 
putting the already pbâd
-                            param into the body of the fat request pb (and 
same making
-                            result)</para>
-                    </listitem>
-                    <listitem>
-                        <para>We can decide whether to accept the request or 
not before we read the
-                            param; for example, the request might be low 
priority. Â As is, we read
-                            header+param in one go as server is currently 
implemented so this is a
-                            TODO.</para>
-                    </listitem>
-                </orderedlist>
-            </para>
-            <para>The advantages are minor. Â If later, fat request has clear 
advantage, can roll out
-                a v2 later.</para>
-        </section>
-        <section
-            xml:id="rpc.configs">
-            <title>RPC Configurations</title>
-            <section>
-                <title>CellBlock Codecs</title>
-                <para>To enable a codec other than the default 
<classname>KeyValueCodec</classname>,
-                    set <varname>hbase.client.rpc.codec</varname> to the name 
of the Codec class to
-                    use. Codec must implement hbase's 
<classname>Codec</classname> Interface. After
-                    connection setup, all passed cellblocks will be sent with 
this codec. The server
-                    will return cellblocks using this same codec as long as 
the codec is on the
-                    servers' CLASSPATH (else you will get
-                        
<classname>UnsupportedCellCodecException</classname>).</para>
-                <para>To change the default codec, set
-                        <varname>hbase.client.default.rpc.codec</varname>. 
</para>
-                <para>To disable cellblocks completely and to go pure 
protobuf, set the default to
-                    the empty String and do not specify a codec in your 
Configuration. So, set
-                        <varname>hbase.client.default.rpc.codec</varname> to 
the empty string and do
-                    not set <varname>hbase.client.rpc.codec</varname>. This 
will cause the client to
-                    connect to the server with no codec specified. If a server 
sees no codec, it
-                    will return all responses in pure protobuf. Running pure 
protobuf all the time
-                    will be slower than running with cellblocks. </para>
-            </section>
-            <section>
-                <title>Compression</title>
-                <para>Uses hadoops compression codecs. To enable compressing 
of passed CellBlocks,
-                    set <varname>hbase.client.rpc.compressor</varname> to the 
name of the Compressor
-                    to use. Compressor must implement Hadoops' 
CompressionCodec Interface. After
-                    connection setup, all passed cellblocks will be sent 
compressed. The server will
-                    return cellblocks compressed using this same compressor as 
long as the
-                    compressor is on its CLASSPATH (else you will get
-                        
<classname>UnsupportedCompressionCodecException</classname>).</para>
-            </section>
-        </section>
-    </section>
-</appendix>


http://git-wip-us.apache.org/repos/asf/hbase/blob/cb77a925/src/main/docbkx/schema_design.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/schema_design.xml 
b/src/main/docbkx/schema_design.xml
deleted file mode 100644
index e4632ec..0000000
--- a/src/main/docbkx/schema_design.xml
+++ /dev/null
@@ -1,1262 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter
-  version="5.0"
-  xml:id="schema"
-  xmlns="http://docbook.org/ns/docbook";
-  xmlns:xlink="http://www.w3.org/1999/xlink";
-  xmlns:xi="http://www.w3.org/2001/XInclude";
-  xmlns:svg="http://www.w3.org/2000/svg";
-  xmlns:m="http://www.w3.org/1998/Math/MathML";
-  xmlns:html="http://www.w3.org/1999/xhtml";
-  xmlns:db="http://docbook.org/ns/docbook";>
-  <!--
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-  <title>HBase and Schema Design</title>
-  <para>A good general introduction on the strength and weaknesses modelling 
on the various
-    non-rdbms datastores is Ian Varley's Master thesis, <link
-      
xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf";>No
 Relation:
-      The Mixed Blessings of Non-Relational Databases</link>. Recommended. 
Also, read <xref
-      linkend="keyvalue" /> for how HBase stores data internally, and the 
section on <xref
-      linkend="schema.casestudies" />. </para>
-  <section
-    xml:id="schema.creation">
-    <title> Schema Creation </title>
-    <para>HBase schemas can be created or updated with <xref
-        linkend="shell" /> or by using <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html";>HBaseAdmin</link>
-      in the Java API. </para>
-    <para>Tables must be disabled when making ColumnFamily modifications, for 
example:</para>
-    <programlisting language="java">
-Configuration config = HBaseConfiguration.create();
-HBaseAdmin admin = new HBaseAdmin(conf);
-String table = "myTable";
-
-admin.disableTable(table);
-
-HColumnDescriptor cf1 = ...;
-admin.addColumn(table, cf1);      // adding new ColumnFamily
-HColumnDescriptor cf2 = ...;
-admin.modifyColumn(table, cf2);    // modifying existing ColumnFamily
-
-admin.enableTable(table);
-    </programlisting>
-    <para>See <xref
-        linkend="client_dependencies" /> for more information about 
configuring client
-      connections.</para>
-    <para>Note: online schema changes are supported in the 0.92.x codebase, 
but the 0.90.x codebase
-      requires the table to be disabled. </para>
-    <section
-      xml:id="schema.updates">
-      <title>Schema Updates</title>
-      <para>When changes are made to either Tables or ColumnFamilies (e.g., 
region size, block
-        size), these changes take effect the next time there is a major 
compaction and the
-        StoreFiles get re-written. </para>
-      <para>See <xref
-          linkend="store" /> for more information on StoreFiles. </para>
-    </section>
-  </section>
-  <section
-    xml:id="number.of.cfs">
-    <title> On the number of column families </title>
-    <para> HBase currently does not do well with anything above two or three 
column families so keep
-      the number of column families in your schema low. Currently, flushing 
and compactions are done
-      on a per Region basis so if one column family is carrying the bulk of 
the data bringing on
-      flushes, the adjacent families will also be flushed though the amount of 
data they carry is
-      small. When many column families the flushing and compaction interaction 
can make for a bunch
-      of needless i/o loading (To be addressed by changing flushing and 
compaction to work on a per
-      column family basis). For more information on compactions, see <xref
-        linkend="compaction" />. </para>
-    <para>Try to make do with one column family if you can in your schemas. 
Only introduce a second
-      and third column family in the case where data access is usually column 
scoped; i.e. you query
-      one column family or the other but usually not both at the one time. 
</para>
-    <section
-      xml:id="number.of.cfs.card">
-      <title>Cardinality of ColumnFamilies</title>
-      <para>Where multiple ColumnFamilies exist in a single table, be aware of 
the cardinality
-        (i.e., number of rows). If ColumnFamilyA has 1 million rows and 
ColumnFamilyB has 1 billion
-        rows, ColumnFamilyA's data will likely be spread across many, many 
regions (and
-        RegionServers). This makes mass scans for ColumnFamilyA less 
efficient. </para>
-    </section>
-  </section>
-  <section
-    xml:id="rowkey.design">
-    <title>Rowkey Design</title>
-    <section>
-      <title>Hotspotting</title>
-      <para>Rows in HBase are sorted lexicographically by row key. This design 
optimizes for scans,
-        allowing you to store related rows, or rows that will be read 
together, near each other.
-        However, poorly designed row keys are a common source of 
<firstterm>hotspotting</firstterm>.
-        Hotspotting occurs when a large amount of client traffic is directed 
at one node, or only a
-        few nodes, of a cluster. This traffic may represent reads, writes, or 
other operations. The
-        traffic overwhelms the single machine responsible for hosting that 
region, causing
-        performance degradation and potentially leading to region 
unavailability. This can also have
-        adverse effects on other regions hosted by the same region server as 
that host is unable to
-        service the requested load. It is important to design data access 
patterns such that the
-        cluster is fully and evenly utilized.</para>
-      <para>To prevent hotspotting on writes, design your row keys such that 
rows that truly do need
-        to be in the same region are, but in the bigger picture, data is being 
written to multiple
-        regions across the cluster, rather than one at a time. Some common 
techniques for avoiding
-        hotspotting are described below, along with some of their advantages 
and drawbacks.</para>
-      <formalpara>
-        <title>Salting</title>
-        <para>Salting in this sense has nothing to do with cryptography, but 
refers to adding random
-          data to the start of a row key. In this case, salting refers to 
adding a randomly-assigned
-          prefix to the row key to cause it to sort differently than it 
otherwise would. The number
-          of possible prefixes correspond to the number of regions you want to 
spread the data
-          across. Salting can be helpful if you have a few "hot" row key 
patterns which come up over
-          and over amongst other more evenly-distributed rows. Consider the 
following example, which
-          shows that salting can spread write load across multiple 
regionservers, and illustrates
-          some of the negative implications for reads.</para>
-      </formalpara>
-      <example>
-        <title>Salting Example</title>
-        <para>Suppose you have the following list of row keys, and your table 
is split such that
-          there is one region for each letter of the alphabet. Prefix 'a' is 
one region, prefix 'b'
-          is another. In this table, all rows starting with 'f' are in the 
same region. This example
-          focuses on rows with keys like the following:</para>
-        <screen>
-foo0001
-foo0002
-foo0003
-foo0004          
-        </screen>
-        <para>Now, imagine that you would like to spread these across four 
different regions. You
-          decide to use four different salts: <literal>a</literal>, 
<literal>b</literal>,
-            <literal>c</literal>, and <literal>d</literal>. In this scenario, 
each of these letter
-          prefixes will be on a different region. After applying the salts, 
you have the following
-          rowkeys instead. Since you can now write to four separate regions, 
you theoretically have
-          four times the throughput when writing that you would have if all 
the writes were going to
-          the same region.</para>
-        <screen>
-a-foo0003
-b-foo0001
-c-foo0004
-d-foo0002          
-        </screen>
-        <para>Then, if you add another row, it will randomly be assigned one 
of the four possible
-          salt values and end up near one of the existing rows.</para>
-        <screen>
-a-foo0003
-b-foo0001
-<emphasis>c-foo0003</emphasis>
-c-foo0004
-d-foo0002        
-        </screen>
-        <para>Since this assignment will be random, you will need to do more 
work if you want to
-          retrieve the rows in lexicographic order. In this way, salting 
attempts to increase
-          throughput on writes, but has a cost during reads.</para>
-      </example>
-      <para></para>
-      <formalpara>
-        <title>Hashing</title>
-        <para>Instead of a random assignment, you could use a one-way 
<firstterm>hash</firstterm>
-          that would cause a given row to always be "salted" with the same 
prefix, in a way that
-          would spread the load across the regionservers, but allow for 
predictability during reads.
-          Using a deterministic hash allows the client to reconstruct the 
complete rowkey and use a
-          Get operation to retrieve that row as normal.</para>
-      </formalpara>
-      <example>
-        <title>Hashing Example</title>
-        <para>Given the same situation in the salting example above, you could 
instead apply a
-          one-way hash that would cause the row with key 
<literal>foo0003</literal> to always, and
-          predictably, receive the <literal>a</literal> prefix. Then, to 
retrieve that row, you
-          would already know the key. You could also optimize things so that 
certain pairs of keys
-          were always in the same region, for instance.</para>
-      </example>
-      <formalpara>
-        <title>Reversing the Key</title>
-        <para>A third common trick for preventing hotspotting is to reverse a 
fixed-width or numeric
-        row key so that the part that changes the most often (the least 
significant digit) is first.
-        This effectively randomizes row keys, but sacrifices row ordering 
properties.</para>
-      </formalpara>
-      <para>See <link
-          
xlink:href="https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables";
-          
>https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables</link>,
-        and <link xlink:href="http://phoenix.apache.org/salted.html";>article 
on Salted Tables</link>
-        from the Phoenix project, and the discussion in the comments of <link
-          
xlink:href="https://issues.apache.org/jira/browse/HBASE-11682";>HBASE-11682</link>
 for more
-        information about avoiding hotspotting.</para>
-    </section>
-    <section
-      xml:id="timeseries">
-      <title> Monotonically Increasing Row Keys/Timeseries Data </title>
-      <para> In the HBase chapter of Tom White's book <link
-          xlink:href="http://oreilly.com/catalog/9780596521981";>Hadoop: The 
Definitive Guide</link>
-        (O'Reilly) there is a an optimization note on watching out for a 
phenomenon where an import
-        process walks in lock-step with all clients in concert pounding one of 
the table's regions
-        (and thus, a single node), then moving onto the next region, etc. With 
monotonically
-        increasing row-keys (i.e., using a timestamp), this will happen. See 
this comic by IKai Lan
-        on why monotonically increasing row keys are problematic in 
BigTable-like datastores: <link
-          
xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/";>monotonically
-          increasing values are bad</link>. The pile-up on a single region 
brought on by
-        monotonically increasing keys can be mitigated by randomizing the 
input records to not be in
-        sorted order, but in general it's best to avoid using a timestamp or a 
sequence (e.g. 1, 2,
-        3) as the row-key. </para>
-      <para>If you do need to upload time series data into HBase, you should 
study <link
-          xlink:href="http://opentsdb.net/";>OpenTSDB</link> as a successful 
example. It has a page
-        describing the <link
-          xlink:href=" http://opentsdb.net/schema.html";>schema</link> it uses 
in HBase. The key
-        format in OpenTSDB is effectively [metric_type][event_timestamp], 
which would appear at
-        first glance to contradict the previous advice about not using a 
timestamp as the key.
-        However, the difference is that the timestamp is not in the 
<emphasis>lead</emphasis>
-        position of the key, and the design assumption is that there are 
dozens or hundreds (or
-        more) of different metric types. Thus, even with a continual stream of 
input data with a mix
-        of metric types, the Puts are distributed across various points of 
regions in the table. </para>
-      <para>See <xref
-          linkend="schema.casestudies" /> for some rowkey design examples. 
</para>
-    </section>
-    <section
-      xml:id="keysize">
-      <title>Try to minimize row and column sizes</title>
-      <subtitle>Or why are my StoreFile indices large?</subtitle>
-      <para>In HBase, values are always freighted with their coordinates; as a 
cell value passes
-        through the system, it'll be accompanied by its row, column name, and 
timestamp - always. If
-        your rows and column names are large, especially compared to the size 
of the cell value,
-        then you may run up against some interesting scenarios. One such is 
the case described by
-        Marc Limotte at the tail of <link
-          
xlink:href="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272";>HBASE-3551</link>
-        (recommended!). Therein, the indices that are kept on HBase storefiles 
(<xref
-          linkend="hfile" />) to facilitate random access may end up occupyng 
large chunks of the
-        HBase allotted RAM because the cell value coordinates are large. Mark 
in the above cited
-        comment suggests upping the block size so entries in the store file 
index happen at a larger
-        interval or modify the table schema so it makes for smaller rows and 
column names.
-        Compression will also make for larger indices. See the thread <link
-          
xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize";>a
-          question storefileIndexSize</link> up on the user mailing list. 
</para>
-      <para>Most of the time small inefficiencies don't matter all that much. 
Unfortunately, this is
-        a case where they do. Whatever patterns are selected for 
ColumnFamilies, attributes, and
-        rowkeys they could be repeated several billion times in your data. 
</para>
-      <para>See <xref
-          linkend="keyvalue" /> for more information on HBase stores data 
internally to see why this
-        is important.</para>
-      <section
-        xml:id="keysize.cf">
-        <title>Column Families</title>
-        <para>Try to keep the ColumnFamily names as small as possible, 
preferably one character
-          (e.g. "d" for data/default). </para>
-        <para>See <xref
-            linkend="keyvalue" /> for more information on HBase stores data 
internally to see why
-          this is important.</para>
-      </section>
-      <section
-        xml:id="keysize.attributes">
-        <title>Attributes</title>
-        <para>Although verbose attribute names (e.g., 
"myVeryImportantAttribute") are easier to
-          read, prefer shorter attribute names (e.g., "via") to store in 
HBase. </para>
-        <para>See <xref
-            linkend="keyvalue" /> for more information on HBase stores data 
internally to see why
-          this is important.</para>
-      </section>
-      <section
-        xml:id="keysize.row">
-        <title>Rowkey Length</title>
-        <para>Keep them as short as is reasonable such that they can still be 
useful for required
-          data access (e.g., Get vs. Scan). A short key that is useless for 
data access is not
-          better than a longer key with better get/scan properties. Expect 
tradeoffs when designing
-          rowkeys. </para>
-      </section>
-      <section
-        xml:id="keysize.patterns">
-        <title>Byte Patterns</title>
-        <para>A long is 8 bytes. You can store an unsigned number up to 
18,446,744,073,709,551,615
-          in those eight bytes. If you stored this number as a String -- 
presuming a byte per
-          character -- you need nearly 3x the bytes. </para>
-        <para>Not convinced? Below is some sample code that you can run on 
your own.</para>
-        <programlisting language="java">
-// long
-//
-long l = 1234567890L;
-byte[] lb = Bytes.toBytes(l);
-System.out.println("long bytes length: " + lb.length);   // returns 8
-
-String s = "" + l;
-byte[] sb = Bytes.toBytes(s);
-System.out.println("long as string length: " + sb.length);    // returns 10
-
-// hash
-//
-MessageDigest md = MessageDigest.getInstance("MD5");
-byte[] digest = md.digest(Bytes.toBytes(s));
-System.out.println("md5 digest bytes length: " + digest.length);    // returns 
16
-
-String sDigest = new String(digest);
-byte[] sbDigest = Bytes.toBytes(sDigest);
-System.out.println("md5 digest as string length: " + sbDigest.length);    // 
returns 26
-        </programlisting>
-        <para>Unfortunately, using a binary representation of a type will make 
your data harder to
-          read outside of your code. For example, this is what you will see in 
the shell when you
-          increment a value:</para>
-        <programlisting>
-hbase(main):001:0> incr 't', 'r', 'f:q', 1
-COUNTER VALUE = 1
-
-hbase(main):002:0> get 't', 'r'
-COLUMN                                        CELL
- f:q                                          timestamp=1369163040570, 
value=\x00\x00\x00\x00\x00\x00\x00\x01
-1 row(s) in 0.0310 seconds
-        </programlisting>
-        <para>The shell makes a best effort to print a string, and it this 
case it decided to just
-          print the hex. The same will happen to your row keys inside the 
region names. It can be
-          okay if you know what's being stored, but it might also be 
unreadable if arbitrary data
-          can be put in the same cells. This is the main trade-off. </para>
-      </section>
-
-    </section>
-    <section
-      xml:id="reverse.timestamp">
-      <title>Reverse Timestamps</title>
-      <note>
-        <title>Reverse Scan API</title>
-        <para>
-          <link
-            
xlink:href="https://issues.apache.org/jira/browse/HBASE-4811";>HBASE-4811</link>
-          implements an API to scan a table or a range within a table in 
reverse, reducing the need
-          to optimize your schema for forward or reverse scanning. This 
feature is available in
-          HBase 0.98 and later. See <link
-            
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed%28boolean";
 />
-          for more information. </para>
-      </note>
-
-      <para>A common problem in database processing is quickly finding the 
most recent version of a
-        value. A technique using reverse timestamps as a part of the key can 
help greatly with a
-        special case of this problem. Also found in the HBase chapter of Tom 
White's book Hadoop:
-        The Definitive Guide (O'Reilly), the technique involves appending 
(<code>Long.MAX_VALUE -
-          timestamp</code>) to the end of any key, e.g., 
[key][reverse_timestamp]. </para>
-      <para>The most recent value for [key] in a table can be found by 
performing a Scan for [key]
-        and obtaining the first record. Since HBase keys are in sorted order, 
this key sorts before
-        any older row-keys for [key] and thus is first. </para>
-      <para>This technique would be used instead of using <xref
-          linkend="schema.versions" /> where the intent is to hold onto all 
versions "forever" (or a
-        very long time) and at the same time quickly obtain access to any 
other version by using the
-        same Scan technique. </para>
-    </section>
-    <section
-      xml:id="rowkey.scope">
-      <title>Rowkeys and ColumnFamilies</title>
-      <para>Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could 
exist in each
-        ColumnFamily that exists in a table without collision. </para>
-    </section>
-    <section
-      xml:id="changing.rowkeys">
-      <title>Immutability of Rowkeys</title>
-      <para>Rowkeys cannot be changed. The only way they can be "changed" in a 
table is if the row
-        is deleted and then re-inserted. This is a fairly common question on 
the HBase dist-list so
-        it pays to get the rowkeys right the first time (and/or before you've 
inserted a lot of
-        data). </para>
-    </section>
-    <section
-      xml:id="rowkey.regionsplits">
-      <title>Relationship Between RowKeys and Region Splits</title>
-      <para>If you pre-split your table, it is <emphasis>critical</emphasis> 
to understand how your
-        rowkey will be distributed across the region boundaries. As an example 
of why this is
-        important, consider the example of using displayable hex characters as 
the lead position of
-        the key (e.g., "0000000000000000" to "ffffffffffffffff"). Running 
those key ranges through
-          <code>Bytes.split</code> (which is the split strategy used when 
creating regions in
-          <code>HBaseAdmin.createTable(byte[] startKey, byte[] endKey, 
numRegions)</code> for 10
-        regions will generate the following splits...</para>
-      <screen>
-48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48                                
// 0
-54 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10                 
// 6
-61 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -68                 
// =
-68 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -126  
// D
-75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 72                                
// K
-82 18 18 18 18 18 18 18 18 18 18 18 18 18 18 14                                
// R
-88 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -44                 
// X
-95 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -102                
// _
-102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102                
// f
-      </screen>
-      <para>... (note: the lead byte is listed to the right as a comment.) 
Given that the first
-        split is a '0' and the last split is an 'f', everything is great, 
right? Not so fast. </para>
-      <para>The problem is that all the data is going to pile up in the first 
2 regions and the last
-        region thus creating a "lumpy" (and possibly "hot") region problem. To 
understand why, refer
-        to an <link
-          xlink:href="http://www.asciitable.com";>ASCII Table</link>. '0' is 
byte 48, and 'f' is byte
-        102, but there is a huge gap in byte values (bytes 58 to 96) that will 
<emphasis>never
-          appear in this keyspace</emphasis> because the only values are [0-9] 
and [a-f]. Thus, the
-        middle regions regions will never be used. To make pre-spliting work 
with this example
-        keyspace, a custom definition of splits (i.e., and not relying on the 
built-in split method)
-        is required. </para>
-      <para>Lesson #1: Pre-splitting tables is generally a best practice, but 
you need to pre-split
-        them in such a way that all the regions are accessible in the 
keyspace. While this example
-        demonstrated the problem with a hex-key keyspace, the same problem can 
happen with
-          <emphasis>any</emphasis> keyspace. Know your data. </para>
-      <para>Lesson #2: While generally not advisable, using hex-keys (and more 
generally,
-        displayable data) can still work with pre-split tables as long as all 
the created regions
-        are accessible in the keyspace. </para>
-      <para>To conclude this example, the following is an example of how 
appropriate splits can be
-        pre-created for hex-keys:. </para>
-      <programlisting language="java"><![CDATA[public static boolean 
createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
-throws IOException {
-  try {
-    admin.createTable( table, splits );
-    return true;
-  } catch (TableExistsException e) {
-    logger.info("table " + table.getNameAsString() + " already exists");
-    // the table already exists...
-    return false;
-  }
-}
-
-public static byte[][] getHexSplits(String startKey, String endKey, int 
numRegions) {
-  byte[][] splits = new byte[numRegions-1][];
-  BigInteger lowestKey = new BigInteger(startKey, 16);
-  BigInteger highestKey = new BigInteger(endKey, 16);
-  BigInteger range = highestKey.subtract(lowestKey);
-  BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
-  lowestKey = lowestKey.add(regionIncrement);
-  for(int i=0; i < numRegions-1;i++) {
-    BigInteger key = 
lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
-    byte[] b = String.format("%016x", key).getBytes();
-    splits[i] = b;
-  }
-  return splits;
-}]]></programlisting>
-    </section>
-  </section>
-  <!--  rowkey design -->
-  <section
-    xml:id="schema.versions">
-    <title> Number of Versions </title>
-    <section
-      xml:id="schema.versions.max">
-      <title>Maximum Number of Versions</title>
-      <para>The maximum number of row versions to store is configured per 
column family via <link
-          
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>.
-        The default for max versions is 1. This is an important parameter 
because as described in <xref
-          linkend="datamodel" /> section HBase does <emphasis>not</emphasis> 
overwrite row values,
-        but rather stores different values per row by time (and qualifier). 
Excess versions are
-        removed during major compactions. The number of max versions may need 
to be increased or
-        decreased depending on application needs. </para>
-      <para>It is not recommended setting the number of max versions to an 
exceedingly high level
-        (e.g., hundreds or more) unless those old values are very dear to you 
because this will
-        greatly increase StoreFile size. </para>
-    </section>
-    <section
-      xml:id="schema.minversions">
-      <title> Minimum Number of Versions </title>
-      <para>Like maximum number of row versions, the minimum number of row 
versions to keep is
-        configured per column family via <link
-          
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";>HColumnDescriptor</link>.
-        The default for min versions is 0, which means the feature is 
disabled. The minimum number
-        of row versions parameter is used together with the time-to-live 
parameter and can be
-        combined with the number of row versions parameter to allow 
configurations such as "keep the
-        last T minutes worth of data, at most N versions, <emphasis>but keep 
at least M versions
-          around</emphasis>" (where M is the value for minimum number of row 
versions, M&lt;N). This
-        parameter should only be set when time-to-live is enabled for a column 
family and must be
-        less than the number of row versions. </para>
-    </section>
-  </section>
-  <section
-    xml:id="supported.datatypes">
-    <title> Supported Datatypes </title>
-    <para>HBase supports a "bytes-in/bytes-out" interface via <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html";>Put</link>
-      and <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html";>Result</link>,
-      so anything that can be converted to an array of bytes can be stored as 
a value. Input could
-      be strings, numbers, complex objects, or even images as long as they can 
rendered as bytes. </para>
-    <para>There are practical limits to the size of values (e.g., storing 
10-50MB objects in HBase
-      would probably be too much to ask); search the mailling list for 
conversations on this topic.
-      All rows in HBase conform to the <xref
-        linkend="datamodel" />, and that includes versioning. Take that into 
consideration when
-      making your design, as well as block size for the ColumnFamily. </para>
-
-    <section
-      xml:id="counters">
-      <title>Counters</title>
-      <para> One supported datatype that deserves special mention are 
"counters" (i.e., the ability
-        to do atomic increments of numbers). See <link
-          
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29";>Increment</link>
-        in HTable. </para>
-      <para>Synchronization on counters are done on the RegionServer, not in 
the client. </para>
-    </section>
-  </section>
-  <section
-    xml:id="schema.joins">
-    <title>Joins</title>
-    <para>If you have multiple tables, don't forget to factor in the potential 
for <xref
-        linkend="joins" /> into the schema design. </para>
-  </section>
-  <section
-    xml:id="ttl">
-    <title>Time To Live (TTL)</title>
-    <para> ColumnFamilies can set a TTL length in seconds, and HBase will 
automatically delete rows
-      once the expiration time is reached. This applies to 
<emphasis>all</emphasis> versions of a
-      row - even the current one. The TTL time encoded in the HBase for the 
row is specified in UTC.
-    </para>
-    <para> Store files which contains only expired rows are deleted on minor 
compaction. Setting
-        <varname>hbase.store.delete.expired.storefile</varname> to 
<code>false</code> disables this
-      feature. Setting <link linkend="schema.minversions">minimum number of 
versions</link> to other
-      than 0 also disables this.</para>
-    <para> See <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html";
-        >HColumnDescriptor</link> for more information. </para>
-    <para>Recent versions of HBase also support setting time to live on a per 
cell basis. See <link
-        
xlink:href="https://issues.apache.org/jira/browse/HBASE-10560";>HBASE-10560</link>
 for more
-        information. Cell TTLs are submitted as an attribute on mutation 
requests (Appends,
-        Increments, Puts, etc.) using Mutation#setTTL. If the TTL attribute is 
set, it will be applied
-        to all cells updated on the server by the operation. There are two 
notable differences
-        between cell TTL handling and ColumnFamily TTLs:</para>
-    <itemizedlist>
-      <listitem>
-        <para>Cell TTLs are expressed in units of milliseconds instead of 
seconds.</para>
-      </listitem>
-      <listitem>
-        <para>A cell TTLs cannot extend the effective lifetime of a cell 
beyond a ColumnFamily level
-          TTL setting.</para>
-      </listitem>
-    </itemizedlist>
-  </section>
-  <section
-    xml:id="cf.keep.deleted">
-    <title> Keeping Deleted Cells </title>
-    <para>By default, delete markers extend back to the beginning of time. 
Therefore, <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html";>Get</link>
-      or <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";>Scan</link>
-      operations will not see a deleted cell (row or column), even when the 
Get or Scan operation
-      indicates a time range
-      before the delete marker was placed.</para> 
-    <para>ColumnFamilies can optionally keep deleted cells. In this case, 
deleted cells can still be
-      retrieved, as long as these operations specify a time range that ends 
before the timestamp of
-      any delete that would affect the cells. This allows for point-in-time 
queries even in the
-      presence of deletes. </para>
-    <para> Deleted cells are still subject to TTL and there will never be more 
than "maximum number
-      of versions" deleted cells. A new "raw" scan options returns all deleted 
rows and the delete
-      markers. </para>
-    <example>
-      <title>Change the Value of <code>KEEP_DELETED_CELLS</code> Using HBase 
Shell</title>
-      <screen>hbase> hbase> alter ât1â², NAME => âf1â², 
KEEP_DELETED_CELLS => true</screen>
-    </example>
-    <example>
-      <title>Change the Value of <code>KEEP_DELETED_CELLS</code> Using the 
API</title>
-      <programlisting language="java">...
-HColumnDescriptor.setKeepDeletedCells(true);
-...
-      </programlisting>
-    </example>
-    <para>See the API documentation for <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html#KEEP_DELETED_CELLS";
-        >KEEP_DELETED_CELLS</link> for more information. </para>
-  </section>
-  <section
-    xml:id="secondary.indexes">
-    <title> Secondary Indexes and Alternate Query Paths </title>
-    <para>This section could also be titled "what if my table rowkey looks like
-        <emphasis>this</emphasis> but I also want to query my table like 
<emphasis>that</emphasis>."
-      A common example on the dist-list is where a row-key is of the format 
"user-timestamp" but
-      there are reporting requirements on activity across users for certain 
time ranges. Thus,
-      selecting by user is easy because it is in the lead position of the key, 
but time is not. </para>
-    <para>There is no single answer on the best way to handle this because it 
depends on... </para>
-    <itemizedlist>
-      <listitem>
-        <para>Number of users</para>
-      </listitem>
-      <listitem>
-        <para>Data size and data arrival rate</para>
-      </listitem>
-      <listitem>
-        <para>Flexibility of reporting requirements (e.g., completely ad-hoc 
date selection vs.
-          pre-configured ranges) </para>
-      </listitem>
-      <listitem>
-        <para>Desired execution speed of query (e.g., 90 seconds may be 
reasonable to some for an
-          ad-hoc report, whereas it may be too long for others) </para>
-      </listitem>
-    </itemizedlist>
-    <para>... and solutions are also influenced by the size of the cluster and 
how much processing
-      power you have to throw at the solution. Common techniques are in 
sub-sections below. This is
-      a comprehensive, but not exhaustive, list of approaches. </para>
-    <para>It should not be a surprise that secondary indexes require 
additional cluster space and
-      processing. This is precisely what happens in an RDBMS because the act 
of creating an
-      alternate index requires both space and processing cycles to update. 
RDBMS products are more
-      advanced in this regard to handle alternative index management out of 
the box. However, HBase
-      scales better at larger data volumes, so this is a feature trade-off. 
</para>
-    <para>Pay attention to <xref
-        linkend="performance" /> when implementing any of these 
approaches.</para>
-    <para>Additionally, see the David Butler response in this dist-list thread 
<link
-        
xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase";>HBase,
-        mail # user - Stargate+hbase</link>
-    </para>
-    <section
-      xml:id="secondary.indexes.filter">
-      <title> Filter Query </title>
-      <para>Depending on the case, it may be appropriate to use <xref
-          linkend="client.filter" />. In this case, no secondary index is 
created. However, don't
-        try a full-scan on a large table like this from an application (i.e., 
single-threaded
-        client). </para>
-    </section>
-    <section
-      xml:id="secondary.indexes.periodic">
-      <title> Periodic-Update Secondary Index </title>
-      <para>A secondary index could be created in an other table which is 
periodically updated via a
-        MapReduce job. The job could be executed intra-day, but depending on 
load-strategy it could
-        still potentially be out of sync with the main data table.</para>
-      <para>See <xref
-          linkend="mapreduce.example.readwrite" /> for more information.</para>
-    </section>
-    <section
-      xml:id="secondary.indexes.dualwrite">
-      <title> Dual-Write Secondary Index </title>
-      <para>Another strategy is to build the secondary index while publishing 
data to the cluster
-        (e.g., write to data table, write to index table). If this is approach 
is taken after a data
-        table already exists, then bootstrapping will be needed for the 
secondary index with a
-        MapReduce job (see <xref
-          linkend="secondary.indexes.periodic" />).</para>
-    </section>
-    <section
-      xml:id="secondary.indexes.summary">
-      <title> Summary Tables </title>
-      <para>Where time-ranges are very wide (e.g., year-long report) and where 
the data is
-        voluminous, summary tables are a common approach. These would be 
generated with MapReduce
-        jobs into another table.</para>
-      <para>See <xref
-          linkend="mapreduce.example.summary" /> for more information.</para>
-    </section>
-    <section
-      xml:id="secondary.indexes.coproc">
-      <title> Coprocessor Secondary Index </title>
-      <para>Coprocessors act like RDBMS triggers. These were added in 0.92. 
For more information,
-        see <xref
-          linkend="coprocessors" />
-      </para>
-    </section>
-  </section>
-  <section
-    xml:id="constraints">
-    <title>Constraints</title>
-    <para>HBase currently supports 'constraints' in traditional (SQL) database 
parlance. The advised
-      usage for Constraints is in enforcing business rules for attributes in 
the table (eg. make
-      sure values are in the range 1-10). Constraints could also be used to 
enforce referential
-      integrity, but this is strongly discouraged as it will dramatically 
decrease the write
-      throughput of the tables where integrity checking is enabled. Extensive 
documentation on using
-      Constraints can be found at: <link
-        
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/constraint";>Constraint</link>
-      since version 0.94. </para>
-  </section>
-  <section
-    xml:id="schema.casestudies">
-    <title>Schema Design Case Studies</title>
-    <para>The following will describe some typical data ingestion use-cases 
with HBase, and how the
-      rowkey design and construction can be approached. Note: this is just an 
illustration of
-      potential approaches, not an exhaustive list. Know your data, and know 
your processing
-      requirements. </para>
-    <para>It is highly recommended that you read the rest of the <xref
-        linkend="schema" /> first, before reading these case studies. </para>
-    <para>The following case studies are described: </para>
-    <itemizedlist>
-      <listitem>
-        <para>Log Data / Timeseries Data</para>
-      </listitem>
-      <listitem>
-        <para>Log Data / Timeseries on Steroids</para>
-      </listitem>
-      <listitem>
-        <para>Customer/Order</para>
-      </listitem>
-      <listitem>
-        <para>Tall/Wide/Middle Schema Design</para>
-      </listitem>
-      <listitem>
-        <para>List Data</para>
-      </listitem>
-    </itemizedlist>
-    <section
-      xml:id="schema.casestudies.log-timeseries">
-      <title>Case Study - Log Data and Timeseries Data</title>
-      <para>Assume that the following data elements are being collected. 
</para>
-      <itemizedlist>
-        <listitem>
-          <para>Hostname</para>
-        </listitem>
-        <listitem>
-          <para>Timestamp</para>
-        </listitem>
-        <listitem>
-          <para>Log event</para>
-        </listitem>
-        <listitem>
-          <para>Value/message</para>
-        </listitem>
-      </itemizedlist>
-      <para>We can store them in an HBase table called LOG_DATA, but what will 
the rowkey be? From
-        these attributes the rowkey will be some combination of hostname, 
timestamp, and log-event -
-        but what specifically? </para>
-      <section
-        xml:id="schema.casestudies.log-timeseries.tslead">
-        <title>Timestamp In The Rowkey Lead Position</title>
-        <para>The rowkey <code>[timestamp][hostname][log-event]</code> suffers 
from the
-          monotonically increasing rowkey problem described in <xref
-            linkend="timeseries" />. </para>
-        <para>There is another pattern frequently mentioned in the dist-lists 
about âbucketingâ
-          timestamps, by performing a mod operation on the timestamp. If 
time-oriented scans are
-          important, this could be a useful approach. Attention must be paid 
to the number of
-          buckets, because this will require the same number of scans to 
return results.</para>
-        <programlisting language="java">
-long bucket = timestamp % numBuckets;
-        </programlisting>
-        <para>â¦ to construct:</para>
-        <programlisting>
-[bucket][timestamp][hostname][log-event]
-        </programlisting>
-        <para>As stated above, to select data for a particular timerange, a 
Scan will need to be
-          performed for each bucket. 100 buckets, for example, will provide a 
wide distribution in
-          the keyspace but it will require 100 Scans to obtain data for a 
single timestamp, so there
-          are trade-offs. </para>
-      </section>
-      <!-- ts lead -->
-      <section
-        xml:id="schema.casestudies.log-timeseries.hostlead">
-        <title>Host In The Rowkey Lead Position</title>
-        <para>The rowkey <code>[hostname][log-event][timestamp]</code> is a 
candidate if there is a
-          large-ish number of hosts to spread the writes and reads across the 
keyspace. This
-          approach would be useful if scanning by hostname was a priority. 
</para>
-      </section>
-      <!--  host lead -->
-      <section
-        xml:id="schema.casestudies.log-timeseries.revts">
-        <title>Timestamp, or Reverse Timestamp?</title>
-        <para>If the most important access path is to pull most recent events, 
then storing the
-          timestamps as reverse-timestamps (e.g., <code>timestamp = 
Long.MAX_VALUE â
-            timestamp</code>) will create the property of being able to do a 
Scan on
-            <code>[hostname][log-event]</code> to obtain the quickly obtain 
the most recently
-          captured events. </para>
-        <para>Neither approach is wrong, it just depends on what is most 
appropriate for the
-          situation. </para>
-        <note>
-          <title>Reverse Scan API</title>
-          <para>
-            <link
-              
xlink:href="https://issues.apache.org/jira/browse/HBASE-4811";>HBASE-4811</link>
-            implements an API to scan a table or a range within a table in 
reverse, reducing the
-            need to optimize your schema for forward or reverse scanning. This 
feature is available
-            in HBase 0.98 and later. See <link
-              
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed%28boolean";
 />
-            for more information. </para>
-        </note>
-      </section>
-      <!--  revts -->
-      <section
-        xml:id="schema.casestudies.log-timeseries.varkeys">
-        <title>Variangle Length or Fixed Length Rowkeys?</title>
-        <para>It is critical to remember that rowkeys are stamped on every 
column in HBase. If the
-          hostname is âaâ and the event type is âe1â then the 
resulting rowkey would be quite small.
-          However, what if the ingested hostname is 
âmyserver1.mycompany.comâ and the event type is
-          âcom.package1.subpackage2.subsubpackage3.ImportantServiceâ? 
</para>
-        <para>It might make sense to use some substitution in the rowkey. 
There are at least two
-          approaches: hashed and numeric. In the Hostname In The Rowkey Lead 
Position example, it
-          might look like this: </para>
-        <para>Composite Rowkey With Hashes:</para>
-        <itemizedlist>
-          <listitem>
-            <para>[MD5 hash of hostname] = 16 bytes</para>
-          </listitem>
-          <listitem>
-            <para>[MD5 hash of event-type] = 16 bytes</para>
-          </listitem>
-          <listitem>
-            <para>[timestamp] = 8 bytes</para>
-          </listitem>
-        </itemizedlist>
-        <para>Composite Rowkey With Numeric Substitution: </para>
-        <para>For this approach another lookup table would be needed in 
addition to LOG_DATA, called
-          LOG_TYPES. The rowkey of LOG_TYPES would be: </para>
-        <itemizedlist>
-          <listitem>
-            <para>[type] (e.g., byte indicating hostname vs. event-type)</para>
-          </listitem>
-          <listitem>
-            <para>[bytes] variable length bytes for raw hostname or 
event-type.</para>
-          </listitem>
-        </itemizedlist>
-        <para>A column for this rowkey could be a long with an assigned 
number, which could be
-          obtained by using an <link
-            
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29";>HBase
-            counter</link>. </para>
-        <para>So the resulting composite rowkey would be: </para>
-        <itemizedlist>
-          <listitem>
-            <para>[substituted long for hostname] = 8 bytes</para>
-          </listitem>
-          <listitem>
-            <para>[substituted long for event type] = 8 bytes</para>
-          </listitem>
-          <listitem>
-            <para>[timestamp] = 8 bytes</para>
-          </listitem>
-        </itemizedlist>
-        <para>In either the Hash or Numeric substitution approach, the raw 
values for hostname and
-          event-type can be stored as columns. </para>
-      </section>
-      <!--  varkeys -->
-    </section>
-    <!--  log data and timeseries -->
-    <section
-      xml:id="schema.casestudies.log-steroids">
-      <title>Case Study - Log Data and Timeseries Data on Steroids</title>
-      <para>This effectively is the OpenTSDB approach. What OpenTSDB does is 
re-write data and pack
-        rows into columns for certain time-periods. For a detailed 
explanation, see: <link
-          
xlink:href="http://opentsdb.net/schema.html";>http://opentsdb.net/schema.html</link>,
 and <link
-          
xlink:href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html";>Lessons
-          Learned from OpenTSDB</link> from HBaseCon2012. </para>
-      <para>But this is how the general concept works: data is ingested, for 
example, in this
-        mannerâ¦</para>
-      <screen>
-[hostname][log-event][timestamp1]
-[hostname][log-event][timestamp2]
-[hostname][log-event][timestamp3]
-        </screen>
-      <para>â¦ with separate rowkeys for each detailed event, but is 
re-written like thisâ¦ </para>
-      <screen>[hostname][log-event][timerange]</screen>
-      <para>â¦ and each of the above events are converted into columns stored 
with a time-offset
-        relative to the beginning timerange (e.g., every 5 minutes). This is 
obviously a very
-        advanced processing technique, but HBase makes this possible. </para>
-    </section>
-    <!--  log data timeseries steroids -->
-
-    <section
-      xml:id="schema.casestudies.custorder">
-      <title>Case Study - Customer/Order</title>
-      <para>Assume that HBase is used to store customer and order information. 
There are two core
-        record-types being ingested: a Customer record type, and Order record 
type. </para>
-      <para>The Customer record type would include all the things that youâd 
typically expect: </para>
-      <itemizedlist>
-        <listitem>
-          <para>Customer number</para>
-        </listitem>
-        <listitem>
-          <para>Customer name</para>
-        </listitem>
-        <listitem>
-          <para>Address (e.g., city, state, zip)</para>
-        </listitem>
-        <listitem>
-          <para>Phone numbers, etc.</para>
-        </listitem>
-      </itemizedlist>
-      <para>The Order record type would include things like: </para>
-      <itemizedlist>
-        <listitem>
-          <para>Customer number</para>
-        </listitem>
-        <listitem>
-          <para>Order number</para>
-        </listitem>
-        <listitem>
-          <para>Sales date</para>
-        </listitem>
-        <listitem>
-          <para>A series of nested objects for shipping locations and 
line-items (see <xref
-              linkend="schema.casestudies.custorder.obj" /> for details)</para>
-        </listitem>
-      </itemizedlist>
-      <para>Assuming that the combination of customer number and sales order 
uniquely identify an
-        order, these two attributes will compose the rowkey, and specifically 
a composite key such
-        as: </para>
-      <screen>[customer number][order number]</screen>
-      <para>â¦ for a ORDER table. However, there are more design decisions to 
make: are the
-          <emphasis>raw</emphasis> values the best choices for rowkeys? </para>
-      <para>The same design questions in the Log Data use-case confront us 
here. What is the
-        keyspace of the customer number, and what is the format (e.g., 
numeric? alphanumeric?) As it
-        is advantageous to use fixed-length keys in HBase, as well as keys 
that can support a
-        reasonable spread in the keyspace, similar options appear: </para>
-      <para>Composite Rowkey With Hashes: </para>
-      <itemizedlist>
-        <listitem>
-          <para>[MD5 of customer number] = 16 bytes</para>
-        </listitem>
-        <listitem>
-          <para>[MD5 of order number] = 16 bytes</para>
-        </listitem>
-      </itemizedlist>
-      <para>Composite Numeric/Hash Combo Rowkey: </para>
-      <itemizedlist>
-        <listitem>
-          <para>[substituted long for customer number] = 8 bytes</para>
-        </listitem>
-        <listitem>
-          <para>[MD5 of order number] = 16 bytes</para>
-        </listitem>
-      </itemizedlist>
-      <section
-        xml:id="schema.casestudies.custorder.tables">
-        <title>Single Table? Multiple Tables?</title>
-        <para>A traditional design approach would have separate tables for 
CUSTOMER and SALES.
-          Another option is to pack multiple record types into a single table 
(e.g., CUSTOMER++). </para>
-        <para>Customer Record Type Rowkey: </para>
-        <itemizedlist>
-          <listitem>
-            <para>[customer-id]</para>
-          </listitem>
-          <listitem>
-            <para>[type] = type indicating â1â for customer record 
type</para>
-          </listitem>
-        </itemizedlist>
-        <para>Order Record Type Rowkey: </para>
-        <itemizedlist>
-          <listitem>
-            <para>[customer-id]</para>
-          </listitem>
-          <listitem>
-            <para>[type] = type indicating â2â for order record type</para>
-          </listitem>
-          <listitem>
-            <para>[order]</para>
-          </listitem>
-        </itemizedlist>
-        <para>The advantage of this particular CUSTOMER++ approach is that 
organizes many different
-          record-types by customer-id (e.g., a single scan could get you 
everything about that
-          customer). The disadvantage is that itâs not as easy to scan for a 
particular record-type.
-        </para>
-      </section>
-      <section
-        xml:id="schema.casestudies.custorder.obj">
-        <title>Order Object Design</title>
-        <para>Now we need to address how to model the Order object. Assume 
that the class structure
-          is as follows:</para>
-        <variablelist>
-          <varlistentry>
-            <term>Order</term>
-            <listitem>
-              <para>(an Order can have multiple ShippingLocations</para>
-            </listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>LineItem</term>
-            <listitem>
-              <para>(a ShippingLocation can have multiple LineItems</para>
-            </listitem>
-          </varlistentry>
-        </variablelist>
-        <para>... there are multiple options on storing this data. </para>
-        <section
-          xml:id="schema.casestudies.custorder.obj.norm">
-          <title>Completely Normalized</title>
-          <para>With this approach, there would be separate tables for ORDER, 
SHIPPING_LOCATION, and
-            LINE_ITEM. </para>
-          <para>The ORDER table's rowkey was described above: <xref
-              linkend="schema.casestudies.custorder" />
-          </para>
-          <para>The SHIPPING_LOCATION's composite rowkey would be something 
like this: </para>
-          <itemizedlist>
-            <listitem>
-              <para>[order-rowkey]</para>
-            </listitem>
-            <listitem>
-              <para>[shipping location number] (e.g., 1st location, 2nd, 
etc.)</para>
-            </listitem>
-          </itemizedlist>
-          <para>The LINE_ITEM table's composite rowkey would be something like 
this: </para>
-          <itemizedlist>
-            <listitem>
-              <para>[order-rowkey]</para>
-            </listitem>
-            <listitem>
-              <para>[shipping location number] (e.g., 1st location, 2nd, 
etc.)</para>
-            </listitem>
-            <listitem>
-              <para>[line item number] (e.g., 1st lineitem, 2nd, etc.)</para>
-            </listitem>
-          </itemizedlist>
-          <para>Such a normalized model is likely to be the approach with an 
RDBMS, but that's not
-            your only option with HBase. The cons of such an approach is that 
to retrieve
-            information about any Order, you will need: </para>
-          <itemizedlist>
-            <listitem>
-              <para>Get on the ORDER table for the Order</para>
-            </listitem>
-            <listitem>
-              <para>Scan on the SHIPPING_LOCATION table for that order to get 
the ShippingLocation
-                instances</para>
-            </listitem>
-            <listitem>
-              <para>Scan on the LINE_ITEM for each ShippingLocation</para>
-            </listitem>
-          </itemizedlist>
-          <para>... granted, this is what an RDBMS would do under the covers 
anyway, but since there
-            are no joins in HBase you're just more aware of this fact. </para>
-        </section>
-        <section
-          xml:id="schema.casestudies.custorder.obj.rectype">
-          <title>Single Table With Record Types</title>
-          <para>With this approach, there would exist a single table ORDER 
that would contain </para>
-          <para>The Order rowkey was described above: <xref
-              linkend="schema.casestudies.custorder" /></para>
-          <itemizedlist>
-            <listitem>
-              <para>[order-rowkey]</para>
-            </listitem>
-            <listitem>
-              <para>[ORDER record type]</para>
-            </listitem>
-          </itemizedlist>
-          <para>The ShippingLocation composite rowkey would be something like 
this: </para>
-          <itemizedlist>
-            <listitem>
-              <para>[order-rowkey]</para>
-            </listitem>
-            <listitem>
-              <para>[SHIPPING record type]</para>
-            </listitem>
-            <listitem>
-              <para>[shipping location number] (e.g., 1st location, 2nd, 
etc.)</para>
-            </listitem>
-          </itemizedlist>
-          <para>The LineItem composite rowkey would be something like this: 
</para>
-          <itemizedlist>
-            <listitem>
-              <para>[order-rowkey]</para>
-            </listitem>
-            <listitem>
-              <para>[LINE record type]</para>
-            </listitem>
-            <listitem>
-              <para>[shipping location number] (e.g., 1st location, 2nd, 
etc.)</para>
-            </listitem>
-            <listitem>
-              <para>[line item number] (e.g., 1st lineitem, 2nd, etc.)</para>
-            </listitem>
-          </itemizedlist>
-        </section>
-        <section
-          xml:id="schema.casestudies.custorder.obj.denorm">
-          <title>Denormalized</title>
-          <para>A variant of the Single Table With Record Types approach is to 
denormalize and
-            flatten some of the object hierarchy, such as collapsing the 
ShippingLocation attributes
-            onto each LineItem instance. </para>
-          <para>The LineItem composite rowkey would be something like this: 
</para>
-          <itemizedlist>
-            <listitem>
-              <para>[order-rowkey]</para>
-            </listitem>
-            <listitem>
-              <para>[LINE record type]</para>
-            </listitem>
-            <listitem>
-              <para>[line item number] (e.g., 1st lineitem, 2nd, etc. - care 
must be taken that
-                there are unique across the entire order)</para>
-            </listitem>
-          </itemizedlist>
-          <para>... and the LineItem columns would be something like this: 
</para>
-          <itemizedlist>
-            <listitem>
-              <para>itemNumber</para>
-            </listitem>
-            <listitem>
-              <para>quantity</para>
-            </listitem>
-            <listitem>
-              <para>price</para>
-            </listitem>
-            <listitem>
-              <para>shipToLine1 (denormalized from ShippingLocation)</para>
-            </listitem>
-            <listitem>
-              <para>shipToLine2 (denormalized from ShippingLocation)</para>
-            </listitem>
-            <listitem>
-              <para>shipToCity (denormalized from ShippingLocation)</para>
-            </listitem>
-            <listitem>
-              <para>shipToState (denormalized from ShippingLocation)</para>
-            </listitem>
-            <listitem>
-              <para>shipToZip (denormalized from ShippingLocation)</para>
-            </listitem>
-          </itemizedlist>
-          <para>The pros of this approach include a less complex object 
heirarchy, but one of the
-            cons is that updating gets more complicated in case any of this 
information changes.
-          </para>
-        </section>
-        <section
-          xml:id="schema.casestudies.custorder.obj.singleobj">
-          <title>Object BLOB</title>
-          <para>With this approach, the entire Order object graph is treated, 
in one way or another,
-            as a BLOB. For example, the ORDER table's rowkey was described 
above: <xref
-              linkend="schema.casestudies.custorder" />, and a single column 
called "order" would
-            contain an object that could be deserialized that contained a 
container Order,
-            ShippingLocations, and LineItems. </para>
-          <para>There are many options here: JSON, XML, Java Serialization, 
Avro, Hadoop Writables,
-            etc. All of them are variants of the same approach: encode the 
object graph to a
-            byte-array. Care should be taken with this approach to ensure 
backward compatibilty in
-            case the object model changes such that older persisted structures 
can still be read
-            back out of HBase. </para>
-          <para>Pros are being able to manage complex object graphs with 
minimal I/O (e.g., a single
-            HBase Get per Order in this example), but the cons include the 
aforementioned warning
-            about backward compatiblity of serialization, language 
dependencies of serialization
-            (e.g., Java Serialization only works with Java clients), the fact 
that you have to
-            deserialize the entire object to get any piece of information 
inside the BLOB, and the
-            difficulty in getting frameworks like Hive to work with custom 
objects like this.
-          </para>
-        </section>
-      </section>
-      <!--  cust/order order object -->
-    </section>
-    <!--  cust/order -->
-
-    <section
-      xml:id="schema.smackdown">
-      <title>Case Study - "Tall/Wide/Middle" Schema Design Smackdown</title>
-      <para>This section will describe additional schema design questions that 
appear on the
-        dist-list, specifically about tall and wide tables. These are general 
guidelines and not
-        laws - each application must consider its own needs. </para>
-      <section
-        xml:id="schema.smackdown.rowsversions">
-        <title>Rows vs. Versions</title>
-        <para>A common question is whether one should prefer rows or HBase's 
built-in-versioning.
-          The context is typically where there are "a lot" of versions of a 
row to be retained
-          (e.g., where it is significantly above the HBase default of 1 max 
versions). The
-          rows-approach would require storing a timestamp in some portion of 
the rowkey so that they
-          would not overwite with each successive update. </para>
-        <para>Preference: Rows (generally speaking). </para>
-      </section>
-      <section
-        xml:id="schema.smackdown.rowscols">
-        <title>Rows vs. Columns</title>
-        <para>Another common question is whether one should prefer rows or 
columns. The context is
-          typically in extreme cases of wide tables, such as having 1 row with 
1 million attributes,
-          or 1 million rows with 1 columns apiece. </para>
-        <para>Preference: Rows (generally speaking). To be clear, this 
guideline is in the context
-          is in extremely wide cases, not in the standard use-case where one 
needs to store a few
-          dozen or hundred columns. But there is also a middle path between 
these two options, and
-          that is "Rows as Columns." </para>
-      </section>
-      <section
-        xml:id="schema.smackdown.rowsascols">
-        <title>Rows as Columns</title>
-        <para>The middle path between Rows vs. Columns is packing data that 
would be a separate row
-          into columns, for certain rows. OpenTSDB is the best example of this 
case where a single
-          row represents a defined time-range, and then discrete events are 
treated as columns. This
-          approach is often more complex, and may require the additional 
complexity of re-writing
-          your data, but has the advantage of being I/O efficient. For an 
overview of this approach,
-          see <xref
-            linkend="schema.casestudies.log-steroids" />. </para>
-      </section>
-    </section>
-    <!--  note:  the following id is not consistent with the others becaus it 
was formerly in the Case Studies chapter,
-           but I didn't want to break backward compatibility of the link.  But 
future entries should look like the above case-study
-           links (schema.casestudies. ...)  -->
-    <section
-      xml:id="casestudies.schema.listdata">
-      <title>Case Study - List Data</title>
-      <para>The following is an exchange from the user dist-list regarding a 
fairly common question:
-        how to handle per-user list data in Apache HBase. </para>
-      <para>*** QUESTION ***</para>
-      <para> We're looking at how to store a large amount of (per-user) list 
data in HBase, and we
-        were trying to figure out what kind of access pattern made the most 
sense. One option is
-        store the majority of the data in a key, so we could have something 
like: </para>
-
-      <programlisting><![CDATA[
-<FixedWidthUserName><FixedWidthValueId1>:"" (no value)
-<FixedWidthUserName><FixedWidthValueId2>:"" (no value)
-<FixedWidthUserName><FixedWidthValueId3>:"" (no value)
-]]></programlisting>
-
-      <para>The other option we had was to do this entirely using:</para>
-      <programlisting language="xml"><![CDATA[
-<FixedWidthUserName><FixedWidthPageNum0>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
-<FixedWidthUserName><FixedWidthPageNum1>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
-               ]]></programlisting>
-      <para> where each row would contain multiple values. So in one case 
reading the first thirty
-        values would be: </para>
-      <programlisting language="java"><![CDATA[
-scan { STARTROW => 'FixedWidthUsername' LIMIT => 30}
-               ]]></programlisting>
-      <para>And in the second case it would be </para>
-      <programlisting>
-get 'FixedWidthUserName\x00\x00\x00\x00'
-               </programlisting>
-      <para> The general usage pattern would be to read only the first 30 
values of these lists,
-        with infrequent access reading deeper into the lists. Some users would 
have &lt;= 30 total
-        values in these lists, and some users would have millions (i.e. 
power-law distribution) </para>
-      <para> The single-value format seems like it would take up more space on 
HBase, but would
-        offer some improved retrieval / pagination flexibility. Would there be 
any significant
-        performance advantages to be able to paginate via gets vs paginating 
with scans? </para>
-      <para> My initial understanding was that doing a scan should be faster 
if our paging size is
-        unknown (and caching is set appropriately), but that gets should be 
faster if we'll always
-        need the same page size. I've ended up hearing different people tell 
me opposite things
-        about performance. I assume the page sizes would be relatively 
consistent, so for most use
-        cases we could guarantee that we only wanted one page of data in the 
fixed-page-length case.
-        I would also assume that we would have infrequent updates, but may 
have inserts into the
-        middle of these lists (meaning we'd need to update all subsequent 
rows). </para>
-      <para> Thanks for help / suggestions / follow-up questions. </para>
-      <para>*** ANSWER ***</para>
-      <para> If I understand you correctly, you're ultimately trying to store 
triples in the form
-        "user, valueid, value", right? E.g., something like: </para>
-      <programlisting>
-"user123, firstname, Paul",
-"user234, lastname, Smith"
-                       </programlisting>
-      <para> (But the usernames are fixed width, and the valueids are fixed 
width). </para>
-      <para> And, your access pattern is along the lines of: "for user X, list 
the next 30 values,
-        starting with valueid Y". Is that right? And these values should be 
returned sorted by
-        valueid? </para>
-      <para> The tl;dr version is that you should probably go with one row per 
user+value, and not
-        build a complicated intra-row pagination scheme on your own unless 
you're really sure it is
-        needed. </para>
-      <para> Your two options mirror a common question people have when 
designing HBase schemas:
-        should I go "tall" or "wide"? Your first schema is "tall": each row 
represents one value for
-        one user, and so there are many rows in the table for each user; the 
row key is user +
-        valueid, and there would be (presumably) a single column qualifier 
that means "the value".
-        This is great if you want to scan over rows in sorted order by row key 
(thus my question
-        above, about whether these ids are sorted correctly). You can start a 
scan at any
-        user+valueid, read the next 30, and be done. What you're giving up is 
the ability to have
-        transactional guarantees around all the rows for one user, but it 
doesn't sound like you
-        need that. Doing it this way is generally recommended (see here <link
-          
xlink:href="http://hbase.apache.org/book.html#schema.smackdown";>http://hbase.apache.org/book.html#schema.smackdown</link>).
 </para>
-      <para> Your second option is "wide": you store a bunch of values in one 
row, using different
-        qualifiers (where the qualifier is the valueid). The simple way to do 
that would be to just
-        store ALL values for one user in a single row. I'm guessing you jumped 
to the "paginated"
-        version because you're assuming that storing millions of columns in a 
single row would be
-        bad for performance, which may or may not be true; as long as you're 
not trying to do too
-        much in a single request, or do things like scanning over and 
returning all of the cells in
-        the row, it shouldn't be fundamentally worse. The client has methods 
that allow you to get
-        specific slices of columns. </para>
-      <para> Note that neither case fundamentally uses more disk space than 
the other; you're just
-        "shifting" part of the identifying information for a value either to 
the left (into the row
-        key, in option one) or to the right (into the column qualifiers in 
option 2). Under the
-        covers, every key/value still stores the whole row key, and column 
family name. (If this is
-        a bit confusing, take an hour and watch Lars George's excellent video 
about understanding
-        HBase schema design: <link
-          
xlink:href="http://www.youtube.com/watch?v=_HLoH_PgrLk)">http://www.youtube.com/watch?v=_HLoH_PgrLk)</link>.
 </para>
-      <para> A manually paginated version has lots more complexities, as you 
note, like having to
-        keep track of how many things are in each page, re-shuffling if new 
values are inserted,
-        etc. That seems significantly more complex. It might have some slight 
speed advantages (or
-        disadvantages!) at extremely high throughput, and the only way to 
really know that would be
-        to try it out. If you don't have time to build it both ways and 
compare, my advice would be
-        to start with the simplest option (one row per user+value). Start 
simple and iterate! :)
-      </para>
-    </section>
-    <!--  listdata -->
-
-  </section>
-  <!--  schema design cases -->
-  <section
-    xml:id="schema.ops">
-    <title>Operational and Performance Configuration Options</title>
-    <para>See the Performance section <xref
-        linkend="perf.schema" /> for more information operational and 
performance schema design
-      options, such as Bloom Filters, Table-configured regionsizes, 
compression, and blocksizes.
-    </para>
-  </section>
-
-</chapter>
-<!--  schema design -->

[07/34] hbase git commit: HBASE-12918 Backport asciidoc changes (apurtell and enis)

Reply via email to