http://git-wip-us.apache.org/repos/asf/hbase/blob/7139c90e/src/main/asciidoc/_chapters/cp.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/cp.adoc b/src/main/asciidoc/_chapters/cp.adoc index 96f1c2f..a99e903 100644 --- a/src/main/asciidoc/_chapters/cp.adoc +++ b/src/main/asciidoc/_chapters/cp.adoc @@ -27,30 +27,32 @@ :icons: font :experimental: -HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable (link:http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, pages 66-67.). Coprocessors function in a similar way to Linux kernel modules. +HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, pages 66-67.). Coprocessors function in a similar way to Linux kernel modules. They provide a way to run server-level code against locally-stored data. The functionality they provide is very powerful, but also carries great risk and can have adverse effects on the system, at the level of the operating system. -The information in this chapter is primarily sourced and heavily reused from Mingjie Lai's blog post at link:https://blogs.apache.org/hbase/entry/coprocessor_introduction. +The information in this chapter is primarily sourced and heavily reused from Mingjie Lai's blog post at https://blogs.apache.org/hbase/entry/coprocessor_introduction. Coprocessors are not designed to be used by end users of HBase, but by HBase developers who need to add specialized functionality to HBase. -One example of the use of coprocessors is pluggable compaction and scan policies, which are provided as coprocessors in link:HBASE-6427. +One example of the use of coprocessors is pluggable compaction and scan policies, which are provided as coprocessors in link:https://issues.apache.org/jira/browse/HBASE-6427[HBASE-6427]. == Coprocessor Framework The implementation of HBase coprocessors diverges from the BigTable implementation. -The HBase framework provides a library and runtime environment for executing user code within the HBase region server and master processes. +The HBase framework provides a library and runtime environment for executing user code within the HBase region server and master processes. -The framework API is provided in the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor] package. +The framework API is provided in the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor] package. Two different types of coprocessors are provided by the framework, based on their scope. -.Types of CoprocessorsSystem Coprocessors:: +.Types of Coprocessors + +System Coprocessors:: System coprocessors are loaded globally on all tables and regions hosted by a region server. Table Coprocessors:: You can specify which coprocessors should be loaded on all regions for a table on a per-table basis. -The framework provides two different aspects of extensions as well: [firstterm]_observers_ and [firstterm]_endpoints_. +The framework provides two different aspects of extensions as well: _observers_ and _endpoints_. Observers:: Observers are analogous to triggers in conventional databases. @@ -80,7 +82,7 @@ You can load the coprocessor from your HBase configuration, so that the coproces === Load from Configuration -To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's _hbase-site.xml_ and configure one of the following properties, based on the type of observer you are configuring: +To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's _hbase-site.xml_ and configure one of the following properties, based on the type of observer you are configuring: * `hbase.coprocessor.region.classes`for RegionObservers and Endpoints * `hbase.coprocessor.wal.classes`for WALObservers @@ -90,12 +92,12 @@ To configure a coprocessor to be loaded when HBase starts, modify the RegionServ ==== In this example, one RegionObserver is configured for all the HBase tables. +[source,xml] ---- - <property> - <name>hbase.coprocessor.region.classes</name> - <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value> - </property> + <name>hbase.coprocessor.region.classes</name> + <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value> +</property> ---- ==== @@ -106,7 +108,7 @@ Therefore, the jar file must reside on the server-side HBase classpath. Coprocessors which are loaded in this way will be active on all regions of all tables. These are the system coprocessor introduced earlier. The first listed coprocessors will be assigned the priority `Coprocessor.Priority.SYSTEM`. -Each subsequent coprocessor in the list will have its priority value incremented by one (which reduces its priority, because priorities have the natural sort order of Integers). +Each subsequent coprocessor in the list will have its priority value incremented by one (which reduces its priority, because priorities have the natural sort order of Integers). When calling out to registered observers, the framework executes their callbacks methods in the sorted order of their priority. Ties are broken arbitrarily. @@ -114,13 +116,12 @@ Ties are broken arbitrarily. === Load from the HBase Shell You can load a coprocessor on a specific table via a table attribute. -The following example will load the [systemitem]+FooRegionObserver+ observer when table [systemitem]+t1+ is read or re-read. +The following example will load the `FooRegionObserver` observer when table `t1` is read or re-read. .Load a Coprocessor On a Table Using HBase Shell ==== ---- - -hbase(main):005:0> alter 't1', METHOD => 'table_att', +hbase(main):005:0> alter 't1', METHOD => 'table_att', 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2' Updating all regions with the new schema... 1/1 regions updated. @@ -128,18 +129,18 @@ Done. 0 row(s) in 1.0730 seconds hbase(main):006:0> describe 't1' -DESCRIPTION ENABLED - {NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false - nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B - LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE - => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => - '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ - E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO - CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', - BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3' - , COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647' - , KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY - => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} +DESCRIPTION ENABLED + {NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false + nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B + LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE + => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => + '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ + E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO + CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', + BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3' + , COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647' + , KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY + => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0190 seconds ---- ==== @@ -160,7 +161,7 @@ The value contains four pieces of information which are separated by the `|` cha ==== ---- -hbase(main):007:0> alter 't1', METHOD => 'table_att_unset', +hbase(main):007:0> alter 't1', METHOD => 'table_att_unset', hbase(main):008:0* NAME => 'coprocessor$1' Updating all regions with the new schema... 1/1 regions updated. @@ -168,27 +169,27 @@ Done. 0 row(s) in 1.1130 seconds hbase(main):009:0> describe 't1' -DESCRIPTION ENABLED - {NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false - 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION - S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214 - 7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN - _MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true - '}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => - 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => - 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C - ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO - DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} +DESCRIPTION ENABLED + {NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false + 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION + S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214 + 7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN + _MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true + '}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => + 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => + 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C + ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO + DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0180 seconds ---- ==== WARNING: There is no guarantee that the framework will load a given coprocessor successfully. -For example, the shell command neither guarantees a jar file exists at a particular location nor verifies whether the given class is actually contained in the jar file. +For example, the shell command neither guarantees a jar file exists at a particular location nor verifies whether the given class is actually contained in the jar file. == Check the Status of a Coprocessor -To check the status of a coprocessor after it has been configured, use the +status+ HBase Shell command. +To check the status of a coprocessor after it has been configured, use the `status` HBase Shell command. ---- @@ -200,17 +201,17 @@ master coprocessors: [] localhost:52761 1328082515520 requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, maxHeapMB=995 -ROOT-,,0 - numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, -storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, + numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, +storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[] .META.,,1 - numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, -storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, + numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, +storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[] t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1. - numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, -storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, -totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, + numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, +storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, +totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[AggregateImplementation] 0 dead servers ---- @@ -218,16 +219,12 @@ coprocessors=[AggregateImplementation] == Monitor Time Spent in Coprocessors HBase 0.98.5 introduced the ability to monitor some statistics relating to the amount of time spent executing a given coprocessor. -You can see these statistics via the HBase Metrics framework (see <<hbase_metrics,hbase metrics>> or the Web UI for a given Region Server, via the [label]#Coprocessor Metrics# tab. +You can see these statistics via the HBase Metrics framework (see <<hbase_metrics>> or the Web UI for a given Region Server, via the _Coprocessor Metrics_ tab. These statistics are valuable for debugging and benchmarking the performance impact of a given coprocessor on your cluster. Tracked statistics include min, max, average, and 90th, 95th, and 99th percentile. All times are shown in milliseconds. The statistics are calculated over coprocessor execution samples recorded during the reporting interval, which is 10 seconds by default. -The metrics sampling rate as described in <<hbase_metrics,hbase metrics>>. +The metrics sampling rate as described in <<hbase_metrics>>. .Coprocessor Metrics UI image::coprocessor_stats.png[] - -== Status of Coprocessors in HBase - -Coprocessors and the coprocessor framework are evolving rapidly and work is ongoing on several different JIRAs.
http://git-wip-us.apache.org/repos/asf/hbase/blob/7139c90e/src/main/asciidoc/_chapters/datamodel.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/datamodel.adoc b/src/main/asciidoc/_chapters/datamodel.adoc index 854d784..74238ca 100644 --- a/src/main/asciidoc/_chapters/datamodel.adoc +++ b/src/main/asciidoc/_chapters/datamodel.adoc @@ -32,6 +32,7 @@ This is a terminology overlap with relational databases (RDBMSs), but this is no Instead, it can be helpful to think of an HBase table as a multi-dimensional map. .HBase Data Model Terminology + Table:: An HBase table consists of multiple rows. @@ -67,26 +68,24 @@ Timestamp:: == Conceptual View You can read a very understandable explanation of the HBase data model in the blog post link:http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable[Understanding HBase and BigTable] by Jim R. Wilson. - -Another good explanation is available in the PDF link:http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf[Introduction -to Basic Schema Design] by Amandeep Khurana. +Another good explanation is available in the PDF link:http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf[Introduction to Basic Schema Design] by Amandeep Khurana. It may help to read different perspectives to get a solid understanding of HBase schema design. The linked articles cover the same ground as the information in this section. The following example is a slightly modified form of the one on page 2 of the link:http://research.google.com/archive/bigtable.html[BigTable] paper. -There is a table called `webtable` that contains two rows (`com.cnn.www` and `com.example.www`), three column families named `contents`, `anchor`, and `people`. +There is a table called `webtable` that contains two rows (`com.cnn.www` and `com.example.www`) and three column families named `contents`, `anchor`, and `people`. In this example, for the first row (`com.cnn.www`), `anchor` contains two columns (`anchor:cssnsi.com`, `anchor:my.look.ca`) and `contents` contains one column (`contents:html`). This example contains 5 versions of the row with the row key `com.cnn.www`, and one version of the row with the row key `com.example.www`. The `contents:html` column qualifier contains the entire HTML of a given website. Qualifiers of the `anchor` column family each contain the external site which links to the site represented by the row, along with the text it used in the anchor of its link. -The `people` column family represents people associated with the site. +The `people` column family represents people associated with the site. .Column Names [NOTE] ==== By convention, a column name is made of its column family prefix and a _qualifier_. For example, the column _contents:html_ is made up of the column family `contents` and the `html` qualifier. -The colon character (`:`) delimits the column family from the column family _qualifier_. +The colon character (`:`) delimits the column family from the column family _qualifier_. ==== .Table `webtable` @@ -109,27 +108,27 @@ This is only a mock-up for illustrative purposes and may not be strictly accurat [source,json] ---- { - "com.cnn.www": { - contents: { - t6: contents:html: "<html>..." - t5: contents:html: "<html>..." - t3: contents:html: "<html>..." - } - anchor: { - t9: anchor:cnnsi.com = "CNN" - t8: anchor:my.look.ca = "CNN.com" - } - people: {} - } - "com.example.www": { - contents: { - t5: contents:html: "<html>..." - } - anchor: {} - people: { - t5: people:author: "John Doe" - } - } + "com.cnn.www": { + contents: { + t6: contents:html: "<html>..." + t5: contents:html: "<html>..." + t3: contents:html: "<html>..." + } + anchor: { + t9: anchor:cnnsi.com = "CNN" + t8: anchor:my.look.ca = "CNN.com" + } + people: {} + } + "com.example.www": { + contents: { + t5: contents:html: "<html>..." + } + anchor: {} + people: { + t5: people:author: "John Doe" + } + } } ---- @@ -163,18 +162,18 @@ Thus a request for the value of the `contents:html` column at time stamp `t8` wo Similarly, a request for an `anchor:my.look.ca` value at time stamp `t9` would return no value. However, if no timestamp is supplied, the most recent value for a particular column would be returned. Given multiple versions, the most recent is also the first one found, since timestamps are stored in descending order. -Thus a request for the values of all columns in the row `com.cnn.www` if no timestamp is specified would be: the value of `contents:html` from timestamp `t6`, the value of `anchor:cnnsi.com` from timestamp `t9`, the value of `anchor:my.look.ca` from timestamp `t8`. +Thus a request for the values of all columns in the row `com.cnn.www` if no timestamp is specified would be: the value of `contents:html` from timestamp `t6`, the value of `anchor:cnnsi.com` from timestamp `t9`, the value of `anchor:my.look.ca` from timestamp `t8`. -For more information about the internals of how Apache HBase stores data, see <<regions.arch,regions.arch>>. +For more information about the internals of how Apache HBase stores data, see <<regions.arch,regions.arch>>. == Namespace A namespace is a logical grouping of tables analogous to a database in relation database systems. -This abstraction lays the groundwork for upcoming multi-tenancy related features: +This abstraction lays the groundwork for upcoming multi-tenancy related features: -* Quota Management (HBASE-8410) - Restrict the amount of resources (ie regions, tables) a namespace can consume. -* Namespace Security Administration (HBASE-9206) - provide another level of security administration for tenants. -* Region server groups (HBASE-6721) - A namespace/table can be pinned onto a subset of regionservers thus guaranteeing a course level of isolation. +* Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (ie regions, tables) a namespace can consume. +* Namespace Security Administration (link:https://issues.apache.org/jira/browse/HBASE-9206[HBASE-9206]) - Provide another level of security administration for tenants. +* Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a course level of isolation. [[namespace_creation]] === Namespace management @@ -221,10 +220,10 @@ alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'} [[namespace_special]] === Predefined namespaces -There are two predefined special namespaces: +There are two predefined special namespaces: -* hbase - system namespace, used to contain hbase internal tables -* default - tables with no explicit specified namespace will automatically fall into this namespace. +* hbase - system namespace, used to contain HBase internal tables +* default - tables with no explicit specified namespace will automatically fall into this namespace .Examples ==== @@ -241,11 +240,11 @@ create 'bar', 'fam' == Table -Tables are declared up front at schema definition time. +Tables are declared up front at schema definition time. == Row -Row keys are uninterrpreted bytes. +Row keys are uninterpreted bytes. Rows are lexicographically sorted with the lowest order appearing first in a table. The empty byte array is used to denote both the start and end of a tables' namespace. @@ -255,8 +254,7 @@ The empty byte array is used to denote both the start and end of a tables' names Columns in Apache HBase are grouped into _column families_. All column members of a column family have the same prefix. For example, the columns _courses:history_ and _courses:math_ are both members of the _courses_ column family. -The colon character (`:`) delimits the column family from the -column family qualifier. +The colon character (`:`) delimits the column family from the column family qualifier. The column family prefix must be composed of _printable_ characters. The qualifying tail, the column family _qualifier_, can be made of any arbitrary bytes. Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up an running. @@ -267,29 +265,26 @@ Because tunings and storage specifications are done at the column family level, == Cells A _{row, column, version}_ tuple exactly specifies a `cell` in HBase. -Cell content is uninterrpreted bytes +Cell content is uninterpreted bytes == Data Model Operations The four primary data model operations are Get, Put, Scan, and Delete. -Operations are applied via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances. +Operations are applied via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances. === Get -link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row. -Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)[ - Table.get]. +link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row. +Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)[Table.get]. === Put -link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[ - Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List, java.lang.Object[])[ - Table.batch] (non-writeBuffer). +link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List, java.lang.Object[])[Table.batch] (non-writeBuffer). [[scan]] === Scans -link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes. +link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes. The following is an example of a Scan on a Table instance. Assume that a table is populated with rows with keys "row1", "row2", "row3", and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan instance to return the rows beginning with "row". @@ -309,23 +304,24 @@ scan.setRowPrefixFilter(Bytes.toBytes("row")); ResultScanner rs = table.getScanner(scan); try { for (Result r = rs.next(); r != null; r = rs.next()) { - // process result... + // process result... + } } finally { rs.close(); // always close the ResultScanner! +} ---- -Note that generally the easiest way to specify a specific stop point for a scan is by using the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class. +Note that generally the easiest way to specify a specific stop point for a scan is by using the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class. === Delete -link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table. -Deletes are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)[ - HTable.delete]. +link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table. +Deletes are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)[Table.delete]. HBase does not modify data in place, and so deletes are handled by creating new markers called _tombstones_. -These tombstones, along with the dead values, are cleaned up on major compactions. +These tombstones, along with the dead values, are cleaned up on major compactions. -See <<version.delete,version.delete>> for more information on deleting versions of columns, and see <<compaction,compaction>> for more information on compactions. +See <<version.delete,version.delete>> for more information on deleting versions of columns, and see <<compaction,compaction>> for more information on compactions. [[versions]] == Versions @@ -345,20 +341,20 @@ In particular: * It is OK to write cells in a non-increasing version order. Below we describe how the version dimension in HBase currently works. -See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for discussion of HBase versions. link:http://outerthought.org/blog/417-ot.html[Bending time in HBase] makes for a good read on the version, or time, dimension in HBase. +See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for discussion of HBase versions. link:http://outerthought.org/blog/417-ot.html[Bending time in HBase] makes for a good read on the version, or time, dimension in HBase. It has more detail on versioning than is provided here. -As of this writing, the limiitation _Overwriting values at existing timestamps_ mentioned in the article no longer holds in HBase. +As of this writing, the limitation _Overwriting values at existing timestamps_ mentioned in the article no longer holds in HBase. This section is basically a synopsis of this article by Bruno Dumon. [[specify.number.of.versions]] === Specifying the Number of Versions to Store -The maximum number of versions to store for a given column is part of the column schema and is specified at table creation, or via an +alter+ command, via `HColumnDescriptor.DEFAULT_VERSIONS`. +The maximum number of versions to store for a given column is part of the column schema and is specified at table creation, or via an `alter` command, via `HColumnDescriptor.DEFAULT_VERSIONS`. Prior to HBase 0.96, the default number of versions kept was `3`, but in 0.96 and newer has been changed to `1`. -.Modify the Maximum Number of Versions for a Column +.Modify the Maximum Number of Versions for a Column Family ==== -This example uses HBase Shell to keep a maximum of 5 versions of column `f1`. +This example uses HBase Shell to keep a maximum of 5 versions of all columns in column family `f1`. You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. ---- @@ -366,11 +362,11 @@ hbase> alter ât1â², NAME => âf1â², VERSIONS => 5 ---- ==== -.Modify the Minimum Number of Versions for a Column +.Modify the Minimum Number of Versions for a Column Family ==== -You can also specify the minimum number of versions to store. +You can also specify the minimum number of versions to store per column family. By default, this is set to 0, which means the feature is disabled. -The following example sets the minimum number of versions on field `f1` to `2`, via HBase Shell. +The following example sets the minimum number of versions on all columns in column family `f1` to `2`, via HBase Shell. You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. ---- @@ -378,7 +374,7 @@ hbase> alter ât1â², NAME => âf1â², MIN_VERSIONS => 2 ---- ==== -Starting with HBase 0.98.2, you can specify a global default for the maximum number of versions kept for all newly-created columns, by setting +hbase.column.max.version+ in _hbase-site.xml_. +Starting with HBase 0.98.2, you can specify a global default for the maximum number of versions kept for all newly-created columns, by setting `hbase.column.max.version` in _hbase-site.xml_. See <<hbase.column.max.version,hbase.column.max.version>>. [[versions.ops]] @@ -389,13 +385,12 @@ In this section we look at the behavior of the version dimension for each of the ==== Get/Scan Gets are implemented on top of Scans. -The below discussion of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans]. +The below discussion of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans]. -By default, i.e. -if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways: +By default, i.e. if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways: * to return more than one version, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()[Get.setMaxVersions()] -* to return versions other than the latest, see link:???[Get.setTimeRange()] +* to return versions other than the latest, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange(long,%20long)[Get.setTimeRange()] + To retrieve the latest version that is less than or equal to a given value, thus giving the 'latest' state of the record at a certain point in time, just use a range from 0 to the desired version and set the max versions to 1. @@ -438,7 +433,7 @@ Doing a put always creates a new version of a `cell`, at a certain timestamp. By default the system uses the server's `currentTimeMillis`, but you can specify the version (= the long integer) yourself, on a per-column level. This means you could assign a time in the past or the future, or use the long value for non-time purposes. -To overwrite an existing value, do a put at exactly the same row, column, and version as that of the cell you would overshadow. +To overwrite an existing value, do a put at exactly the same row, column, and version as that of the cell you want to overwrite. ===== Implicit Version Example @@ -471,42 +466,39 @@ put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data)); table.put(put); ---- -Caution: the version timestamp is internally by HBase for things like time-to-live calculations. +Caution: the version timestamp is used internally by HBase for things like time-to-live calculations. It's usually best to avoid setting this timestamp yourself. -Prefer using a separate timestamp attribute of the row, or have the timestamp a part of the rowkey, or both. +Prefer using a separate timestamp attribute of the row, or have the timestamp as a part of the row key, or both. [[version.delete]] ==== Delete There are three different types of internal delete markers. -See Lars Hofhansl's blog for discussion of his attempt adding another, link:http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html[Scanning - in HBase: Prefix Delete Marker]. +See Lars Hofhansl's blog for discussion of his attempt adding another, link:http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html[Scanning in HBase: Prefix Delete Marker]. * Delete: for a specific version of a column. * Delete column: for all versions of a column. * Delete family: for all columns of a particular ColumnFamily -When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column). +When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column). Deletes work by creating _tombstone_ markers. For example, let's suppose we want to delete a row. For this you can specify a version, or else by default the `currentTimeMillis` is used. -What this means is [quote]_delete all - cells where the version is less than or equal to this version_. +What this means is _delete all cells where the version is less than or equal to this version_. HBase never modifies data in place, so for example a delete will not immediately delete (or mark as deleted) the entries in the storage file that correspond to the delete condition. Rather, a so-called _tombstone_ is written, which will mask the deleted values. When HBase does a major compaction, the tombstones are processed to actually remove the dead values, together with the tombstones themselves. If the version you specified when deleting a row is larger than the version of any value in the row, then you can consider the complete row to be deleted. -For an informative discussion on how deletes and versioning interact, see the thread link:http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421[Put w/ - timestamp -> Deleteall -> Put w/ timestamp fails] up on the user mailing list. +For an informative discussion on how deletes and versioning interact, see the thread link:http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421[Put w/timestamp -> Deleteall -> Put w/ timestamp fails] up on the user mailing list. -Also see <<keyvalue,keyvalue>> for more information on the internal KeyValue format. +Also see <<keyvalue,keyvalue>> for more information on the internal KeyValue format. -Delete markers are purged during the next major compaction of the store, unless the +KEEP_DELETED_CELLS+ option is set in the column family. +Delete markers are purged during the next major compaction of the store, unless the `KEEP_DELETED_CELLS` option is set in the column family. To keep the deletes for a configurable amount of time, you can set the delete TTL via the +hbase.hstore.time.to.purge.deletes+ property in _hbase-site.xml_. -If +hbase.hstore.time.to.purge.deletes+ is not set, or set to 0, all delete markers, including those with timestamps in the future, are purged during the next major compaction. -Otherwise, a delete marker with a timestamp in the future is kept until the major compaction which occurs after the time represented by the marker's timestamp plus the value of +hbase.hstore.time.to.purge.deletes+, in milliseconds. +If `hbase.hstore.time.to.purge.deletes` is not set, or set to 0, all delete markers, including those with timestamps in the future, are purged during the next major compaction. +Otherwise, a delete marker with a timestamp in the future is kept until the major compaction which occurs after the time represented by the marker's timestamp plus the value of `hbase.hstore.time.to.purge.deletes`, in milliseconds. NOTE: This behavior represents a fix for an unexpected change that was introduced in HBase 0.94, and was fixed in link:https://issues.apache.org/jira/browse/HBASE-10118[HBASE-10118]. The change has been backported to HBase 0.94 and newer branches. @@ -529,35 +521,34 @@ But they can occur even if you do not care about time: just do delete and put im [[major.compactions.change.query.results]] ==== Major compactions change query results -[quote]_...create three cell versions at t1, t2 and t3, with a maximum-versions - setting of 2. So when getting all versions, only the values at t2 and t3 will be - returned. But if you delete the version at t2 or t3, the one at t1 will appear again. - Obviously, once a major compaction has run, such behavior will not be the case - anymore..._ (See _Garbage Collection_ in link:http://outerthought.org/blog/417-ot.html[Bending time in - HBase].) +_...create three cell versions at t1, t2 and t3, with a maximum-versions + setting of 2. So when getting all versions, only the values at t2 and t3 will be + returned. But if you delete the version at t2 or t3, the one at t1 will appear again. + Obviously, once a major compaction has run, such behavior will not be the case + anymore..._ (See _Garbage Collection_ in link:http://outerthought.org/blog/417-ot.html[Bending time in HBase].) [[dm.sort]] == Sort Order All data model operations HBase return data in sorted order. -First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). +First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). [[dm.column.metadata]] == Column Metadata There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily. -Thus, while HBase can support not only a wide number of columns per row, but a heterogenous set of columns between rows as well, it is your responsibility to keep track of the column names. +Thus, while HBase can support not only a wide number of columns per row, but a heterogeneous set of columns between rows as well, it is your responsibility to keep track of the column names. The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows. -For more information about how HBase stores data internally, see <<keyvalue,keyvalue>>. +For more information about how HBase stores data internally, see <<keyvalue,keyvalue>>. == Joins -Whether HBase supports joins is a common question on the dist-list, and there is a simple answer: it doesn't, at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated in this chapter, the read data model operations in HBase are Get and Scan. +Whether HBase supports joins is a common question on the dist-list, and there is a simple answer: it doesn't, at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated in this chapter, the read data model operations in HBase are Get and Scan. However, that doesn't mean that equivalent join functionality can't be supported in your application, but you have to do it yourself. The two primary strategies are either denormalizing the data upon writing to HBase, or to have lookup tables and do the join between HBase tables in your application or MapReduce code (and as RDBMS' demonstrate, there are several strategies for this depending on the size of the tables, e.g., nested loops vs. -hash-joins). So which is the best approach? It depends on what you are trying to do, and as such there isn't a single answer that works for every use case. +hash-joins). So which is the best approach? It depends on what you are trying to do, and as such there isn't a single answer that works for every use case. == ACID http://git-wip-us.apache.org/repos/asf/hbase/blob/7139c90e/src/main/asciidoc/_chapters/external_apis.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/external_apis.adoc b/src/main/asciidoc/_chapters/external_apis.adoc index dfc64e3..37156ca 100644 --- a/src/main/asciidoc/_chapters/external_apis.adoc +++ b/src/main/asciidoc/_chapters/external_apis.adoc @@ -28,39 +28,39 @@ :experimental: This chapter will cover access to Apache HBase either through non-Java languages, or through custom protocols. -For information on using the native HBase APIs, refer to link:http://hbase.apache.org/apidocs/index.html[User API Reference] and the new <<hbase_apis,hbase apis>> chapter. +For information on using the native HBase APIs, refer to link:http://hbase.apache.org/apidocs/index.html[User API Reference] and the new <<hbase_apis,HBase APIs>> chapter. [[nonjava.jvm]] == Non-Java Languages Talking to the JVM -Currently the documentation on this topic in the link:http://wiki.apache.org/hadoop/Hbase[Apache HBase Wiki]. -See also the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/thrift/package-summary.html#package_description[Thrift API Javadoc]. +Currently the documentation on this topic is in the link:http://wiki.apache.org/hadoop/Hbase[Apache HBase Wiki]. +See also the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/thrift/package-summary.html#package_description[Thrift API Javadoc]. == REST -Currently most of the documentation on REST exists in the link:http://wiki.apache.org/hadoop/Hbase/Stargate[Apache HBase Wiki on REST] (The REST gateway used to be called 'Stargate'). There are also a nice set of blogs on link:http://blog.cloudera.com/blog/2013/03/how-to-use-the-apache-hbase-rest-interface-part-1/[How-to: Use the Apache HBase REST Interface] by Jesse Anderson. +Currently most of the documentation on REST exists in the link:http://wiki.apache.org/hadoop/Hbase/Stargate[Apache HBase Wiki on REST] (The REST gateway used to be called 'Stargate'). There are also a nice set of blogs on link:http://blog.cloudera.com/blog/2013/03/how-to-use-the-apache-hbase-rest-interface-part-1/[How-to: Use the Apache HBase REST Interface] by Jesse Anderson. + +To run your REST server under SSL, set `hbase.rest.ssl.enabled` to `true` and also set the following configs when you launch the REST server: (See example commands in <<jmx_config,JMX config>>) -To run your REST server under SSL, set hbase.rest.ssl.enabled to true and also set the following configs when you launch the REST server:(See example commands in <<jmx_config,JMX config>>) [source] ---- - hbase.rest.ssl.keystore.store hbase.rest.ssl.keystore.password hbase.rest.ssl.keystore.keypassword ----- +---- HBase ships a simple REST client, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/client/package-summary.html[REST client] package for details. -To enable SSL support for it, please also import your certificate into local java cacerts keystore: +To enable SSL support for it, please also import your certificate into local java cacerts keystore: ---- keytool -import -trustcacerts -file /home/user/restserver.cert -keystore $JAVA_HOME/jre/lib/security/cacerts ----- +---- == Thrift -Documentation about Thrift has moved to <<thrift,thrift>>. +Documentation about Thrift has moved to <<thrift>>. [[c]] == C/C++ Apache HBase Client FB's Chip Turner wrote a pure C/C++ client. - link:https://github.com/facebook/native-cpp-hbase-client[Check it out]. +link:https://github.com/facebook/native-cpp-hbase-client[Check it out]. http://git-wip-us.apache.org/repos/asf/hbase/blob/7139c90e/src/main/asciidoc/_chapters/getting_started.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/getting_started.adoc b/src/main/asciidoc/_chapters/getting_started.adoc index 9e0b5a1..76d793c 100644 --- a/src/main/asciidoc/_chapters/getting_started.adoc +++ b/src/main/asciidoc/_chapters/getting_started.adoc @@ -28,30 +28,31 @@ == Introduction -<<quickstart,quickstart>> will get you up and running on a single-node, standalone instance of HBase, followed by a pseudo-distributed single-machine instance, and finally a fully-distributed cluster. +<<quickstart,Quickstart>> will get you up and running on a single-node, standalone instance of HBase, followed by a pseudo-distributed single-machine instance, and finally a fully-distributed cluster. [[quickstart]] -== Quick Start +== Quick Start - Standalone HBase -This guide describes setup of a standalone HBase instance running against the local filesystem. +This guide describes the setup of a standalone HBase instance running against the local filesystem. This is not an appropriate configuration for a production instance of HBase, but will allow you to experiment with HBase. -This section shows you how to create a table in HBase using the +hbase shell+ CLI, insert rows into the table, perform put and scan operations against the table, enable or disable the table, and start and stop HBase. +This section shows you how to create a table in HBase using the `hbase shell` CLI, insert rows into the table, perform put and scan operations against the table, enable or disable the table, and start and stop HBase. Apart from downloading HBase, this procedure should take less than 10 minutes. -WARNING: Local Filesystem and Durability This is fixed in HBase 0.98.3 and beyond. See link:https://issues.apache.org/jira/browse/HBASE-11272[HBASE-11272] and link:https://issues.apache.org/jira/browse/HBASE-11218[HBASE-11218]._ +.Local Filesystem and Durability +WARNING: _The following is fixed in HBase 0.98.3 and beyond. See link:https://issues.apache.org/jira/browse/HBASE-11272[HBASE-11272] and link:https://issues.apache.org/jira/browse/HBASE-11218[HBASE-11218]._ Using HBase with a local filesystem does not guarantee durability. The HDFS local filesystem implementation will lose edits if files are not properly closed. This is very likely to happen when you are experimenting with new software, starting and stopping the daemons often and not always cleanly. You need to run HBase on HDFS to ensure all writes are preserved. Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation. -See link:https://issues.apache.org/jira/browse/HBASE-3696 and its associated issues for more details about the issues of running on the local filesystem. +See link:https://issues.apache.org/jira/browse/HBASE-3696[HBASE-3696] and its associated issues for more details about the issues of running on the local filesystem. +[[loopback.ip]] .Loopback IP - HBase 0.94.x and earlier - NOTE: _The below advice is for hbase-0.94.x and older versions only. This is fixed in hbase-0.96.0 and beyond._ -Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you . See link:http://blog.devving.com/why-does-hbase-care-about-etchosts/[Why does HBase care about /etc/hosts?] for detail. +Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you. See link:http://devving.com/?p=414[Why does HBase care about /etc/hosts?] for detail .Example /etc/hosts File for Ubuntu @@ -69,7 +70,7 @@ The following _/etc/hosts_ file works correctly for HBase 0.94.x and earlier, on === JDK Version Requirements HBase requires that a JDK be installed. -See <<java,java>> for information about supported JDK versions. +See <<java,Java>> for information about supported JDK versions. === Get Started with HBase @@ -86,11 +87,11 @@ See <<java,java>> for information about supported JDK versions. + ---- -$ tar xzvf hbase-<?eval ${project.version}?>-hadoop2-bin.tar.gz +$ tar xzvf hbase-<?eval ${project.version}?>-hadoop2-bin.tar.gz $ cd hbase-<?eval ${project.version}?>-hadoop2/ ---- -. For HBase 0.98.5 and later, you are required to set the `JAVA_HOME` environment variable before starting HBase. +. For HBase 0.98.5 and later, you are required to set the `JAVA_HOME` environment variable before starting HBase. Prior to 0.98.5, HBase attempted to detect the location of Java if the variables was not set. You can set the variable via your operating system's usual mechanism, but HBase provides a central mechanism, _conf/hbase-env.sh_. Edit this file, uncomment the line starting with `JAVA_HOME`, and set it to the appropriate location for your operating system. @@ -103,14 +104,14 @@ JAVA_HOME=/usr ---- + NOTE: These instructions assume that each node of your cluster uses the same configuration. -If this is not the case, you may need to set `JAVA_HOME` separately for each node. +If this is not the case, you may need to set `JAVA_HOME` separately for each node. . Edit _conf/hbase-site.xml_, which is the main HBase configuration file. - At this time, you only need to specify the directory on the local filesystem where HBase and Zookeeper write data. + At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data. By default, a new directory is created under /tmp. - Many servers are configured to delete the contents of /tmp upon reboot, so you should store the data elsewhere. - The following configuration will store HBase's data in the _hbase_ directory, in the home directory of the user called [systemitem]+testuser+. - Paste the [markup]+<property>+ tags beneath the [markup]+<configuration>+ tags, which should be empty in a new HBase install. + Many servers are configured to delete the contents of _/tmp_ upon reboot, so you should store the data elsewhere. + The following configuration will store HBase's data in the _hbase_ directory, in the home directory of the user called `testuser`. + Paste the `<property>` tags beneath the `<configuration>` tags, which should be empty in a new HBase install. + .Example _hbase-site.xml_ for Standalone HBase ==== @@ -136,7 +137,7 @@ If you create the directory, HBase will attempt to do a migration, which is not . The _bin/start-hbase.sh_ script is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. - You can use the +jps+ command to verify that you have one running process called `HMaster`. + You can use the `jps` command to verify that you have one running process called `HMaster`. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon. + @@ -144,10 +145,11 @@ NOTE: Java needs to be installed and available. If you get an error indicating that Java is not installed, but it is on your system, perhaps in a non-standard location, edit the _conf/hbase-env.sh_ file and modify the `JAVA_HOME` setting to point to the directory that contains _bin/java_ your system. +[[shell_exercises]] .Procedure: Use HBase For the First Time . Connect to HBase. + -Connect to your running instance of HBase using the +hbase shell+ command, located in the _bin/_ directory of your HBase install. +Connect to your running instance of HBase using the `hbase shell` command, located in the [path]_bin/_ directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a `>` character. + @@ -168,21 +170,21 @@ Use the `create` command to create a new table. You must specify the table name and the ColumnFamily name. + ---- +hbase(main):001:0> create 'test', 'cf' +0 row(s) in 0.4170 seconds -hbase> create 'test', 'cf' -0 row(s) in 1.2200 seconds +=> Hbase::Table - test ---- . List Information About your Table + -Use the `list` command to +Use the `list` command to + ---- - -hbase> list 'test' +hbase(main):002:0> list 'test' TABLE test -1 row(s) in 0.0350 seconds +1 row(s) in 0.0180 seconds => ["test"] ---- @@ -192,15 +194,14 @@ test To put data into your table, use the `put` command. + ---- +hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1' +0 row(s) in 0.0850 seconds -hbase> put 'test', 'row1', 'cf:a', 'value1' -0 row(s) in 0.1770 seconds +hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2' +0 row(s) in 0.0110 seconds -hbase> put 'test', 'row2', 'cf:b', 'value2' -0 row(s) in 0.0160 seconds - -hbase> put 'test', 'row3', 'cf:c', 'value3' -0 row(s) in 0.0260 seconds +hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3' +0 row(s) in 0.0100 seconds ---- + Here, we insert three values, one at a time. @@ -210,51 +211,47 @@ Columns in HBase are comprised of a column family prefix, `cf` in this example, . Scan the table for all data at once. + One of the ways to get data from HBase is to scan. -Use the +scan+ command to scan the table for data. +Use the `scan` command to scan the table for data. You can limit your scan, but for now, all data is fetched. + ---- - -hbase> scan 'test' -ROW COLUMN+CELL - row1 column=cf:a, timestamp=1403759475114, value=value1 - row2 column=cf:b, timestamp=1403759492807, value=value2 - row3 column=cf:c, timestamp=1403759503155, value=value3 -3 row(s) in 0.0440 seconds +hbase(main):006:0> scan 'test' +ROW COLUMN+CELL + row1 column=cf:a, timestamp=1421762485768, value=value1 + row2 column=cf:b, timestamp=1421762491785, value=value2 + row3 column=cf:c, timestamp=1421762496210, value=value3 +3 row(s) in 0.0230 seconds ---- . Get a single row of data. + -To get a single row of data at a time, use the +get+ command. +To get a single row of data at a time, use the `get` command. + ---- - -hbase> get 'test', 'row1' -COLUMN CELL - cf:a timestamp=1403759475114, value=value1 -1 row(s) in 0.0230 seconds +hbase(main):007:0> get 'test', 'row1' +COLUMN CELL + cf:a timestamp=1421762485768, value=value1 +1 row(s) in 0.0350 seconds ---- . Disable a table. + -If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the `disable` command. +If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the `disable` command. You can re-enable it using the `enable` command. + ---- +hbase(main):008:0> disable 'test' +0 row(s) in 1.1820 seconds -hbase> disable 'test' -0 row(s) in 1.6270 seconds - -hbase> enable 'test' -0 row(s) in 0.4500 seconds +hbase(main):009:0> enable 'test' +0 row(s) in 0.1770 seconds ---- + -Disable the table again if you tested the +enable+ command above: +Disable the table again if you tested the `enable` command above: + ---- - -hbase> disable 'test' -0 row(s) in 1.6270 seconds +hbase(main):010:0> disable 'test' +0 row(s) in 1.1820 seconds ---- . Drop the table. @@ -262,14 +259,13 @@ hbase> disable 'test' To drop (delete) a table, use the `drop` command. + ---- - -hbase> drop 'test' -0 row(s) in 0.2900 seconds +hbase(main):011:0> drop 'test' +0 row(s) in 0.1370 seconds ---- . Exit the HBase Shell. + -To exit the HBase Shell and disconnect from your cluster, use the +quit+ command. +To exit the HBase Shell and disconnect from your cluster, use the `quit` command. HBase is still running in the background. @@ -284,7 +280,7 @@ $ ---- . After issuing the command, it can take several minutes for the processes to shut down. - Use the +jps+ to be sure that the HMaster and HRegionServer processes are shut down. + Use the `jps` to be sure that the HMaster and HRegionServer processes are shut down. [[quickstart_pseudo]] === Intermediate - Pseudo-Distributed Local Install @@ -313,7 +309,7 @@ This procedure will create a totally new directory where HBase will store its da + Edit the _hbase-site.xml_ configuration. First, add the following property. -which directs HBase to run in distributed mode, with one JVM instance per daemon. +which directs HBase to run in distributed mode, with one JVM instance per daemon. + [source,xml] ---- @@ -343,13 +339,13 @@ If you create the directory, HBase will attempt to do a migration, which is not . Start HBase. + Use the _bin/start-hbase.sh_ command to start HBase. -If your system is configured correctly, the +jps+ command should show the HMaster and HRegionServer processes running. +If your system is configured correctly, the `jps` command should show the HMaster and HRegionServer processes running. . Check the HBase directory in HDFS. + If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in _/hbase/_ on HDFS. -You can use the +hadoop fs+ command in Hadoop's _bin/_ directory to list this directory. +You can use the `hadoop fs` command in Hadoop's _bin/_ directory to list this directory. + ---- @@ -375,7 +371,7 @@ This step is offered for testing and learning purposes only. + The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. -To start a backup HMaster, use the +local-master-backup.sh+. +To start a backup HMaster, use the `local-master-backup.sh`. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035. @@ -386,8 +382,8 @@ $ ./bin/local-master-backup.sh 2 3 5 ---- + To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like _/tmp/hbase-USER-X-master.pid_. -The only contents of the file are the PID. -You can use the +kill -9+ command to kill that PID. +The only contents of the file is the PID. +You can use the `kill -9` command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running: + ---- @@ -400,8 +396,8 @@ $ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9 The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode. -The +local-regionservers.sh+ command allows you to run multiple RegionServers. -It works in a similar way to the +local-master-backup.sh+ command, in that each parameter you provide represents the port offset for an instance. +The `local-regionservers.sh` command allows you to run multiple RegionServers. +It works in a similar way to the `local-master-backup.sh` command, in that each parameter you provide represents the port offset for an instance. Each RegionServer requires two ports, and the default ports are 16020 and 16030. However, the base ports for additional RegionServers are not the default ports since the default ports are used by the HMaster, which is also a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead. @@ -413,7 +409,7 @@ The following command starts four additional RegionServers, running on sequentia $ .bin/local-regionservers.sh start 2 3 4 5 ---- + -To stop a RegionServer manually, use the +local-regionservers.sh+ command with the `stop` parameter and the offset of the server to stop. +To stop a RegionServer manually, use the `local-regionservers.sh` command with the `stop` parameter and the offset of the server to stop. + ---- $ .bin/local-regionservers.sh stop 3 @@ -444,20 +440,21 @@ The architecture will be as follows: |=== This quickstart assumes that each node is a virtual machine and that they are all on the same network. -It builds upon the previous quickstart, <<quickstart_pseudo,quickstart-pseudo>>, assuming that the system you configured in that procedure is now `node-a`. -Stop HBase on `node-a` before continuing. +It builds upon the previous quickstart, <<quickstart_pseudo>>, assuming that the system you configured in that procedure is now `node-a`. +Stop HBase on `node-a` before continuing. NOTE: Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other. If you see any errors like `no route to host`, check your firewall. -.Procedure: Configure Password-Less SSH Access +[[passwordless.ssh.quickstart]] +.Procedure: Configure Passwordless SSH Access `node-a` needs to be able to log into `node-b` and `node-c` (and to itself) in order to start the daemons. -The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from `node-a` to each of the others. +The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from `node-a` to each of the others. . On `node-a`, generate a key pair. + -While logged in as the user who will run HBase, generate a SSH key pair, using the following command: +While logged in as the user who will run HBase, generate a SSH key pair, using the following command: + [source,bash] ---- @@ -474,9 +471,9 @@ If it already exists, be aware that it may already contain other keys. . Copy the public key to the other nodes. + -Securely copy the public key from `node-a` to each of the nodes, by using the +scp+ or some other secure means. +Securely copy the public key from `node-a` to each of the nodes, by using the `scp` or some other secure means. On each of the other nodes, create a new file called _.ssh/authorized_keys_ _if it does - not already exist_, and append the contents of the _id_rsa.pub_ file to the end of it. +not already exist_, and append the contents of the _id_rsa.pub_ file to the end of it. Note that you also need to do this for `node-a` itself. + ---- @@ -485,7 +482,7 @@ $ cat id_rsa.pub >> ~/.ssh/authorized_keys . Test password-less login. + -If you performed the procedure correctly, if you SSH from `node-a` to either of the other nodes, using the same username, you should not be prompted for a password. +If you performed the procedure correctly, if you SSH from `node-a` to either of the other nodes, using the same username, you should not be prompted for a password. . Since `node-b` will run a backup Master, repeat the procedure above, substituting `node-b` everywhere you see `node-a`. Be sure not to overwrite your existing _.ssh/authorized_keys_ files, but concatenate the new key onto the existing file using the `>>` operator rather than the `>` operator. @@ -515,7 +512,7 @@ This configuration will direct HBase to start and manage a ZooKeeper instance on + On `node-a`, edit _conf/hbase-site.xml_ and add the following properties. + -[source,bourne] +[source,xml] ---- <property> <name>hbase.zookeeper.quorum</name> @@ -538,24 +535,23 @@ On `node-a`, edit _conf/hbase-site.xml_ and add the following properties. + Download and unpack HBase to `node-b`, just as you did for the standalone and pseudo-distributed quickstarts. -. Copy the configuration files from `node-a` to `node-b`.and - `node-c`. +. Copy the configuration files from `node-a` to `node-b`.and `node-c`. + Each node of your cluster needs to have the same configuration information. -Copy the contents of the _conf/_ directory to the _conf/_ directory on `node-b` and `node-c`. +Copy the contents of the _conf/_ directory to the _conf/_ directory on `node-b` and `node-c`. .Procedure: Start and Test Your Cluster . Be sure HBase is not running on any node. + If you forgot to stop HBase from previous testing, you will have errors. -Check to see whether HBase is running on any of your nodes by using the +jps+ command. +Check to see whether HBase is running on any of your nodes by using the `jps` command. Look for the processes `HMaster`, `HRegionServer`, and `HQuorumPeer`. If they exist, kill them. . Start the cluster. + -On `node-a`, issue the +start-hbase.sh+ command. +On `node-a`, issue the `start-hbase.sh` command. Your output will be similar to that below. + ---- @@ -566,15 +562,15 @@ node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-had node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out -node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out +node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out ---- + -ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters. +ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters. . Verify that the processes are running. + -On each node of the cluster, run the +jps+ command and verify that the correct processes are running on each server. +On each node of the cluster, run the `jps` command and verify that the correct processes are running on each server. You may see additional Java processes running on your servers as well, if they are used for other purposes. + .`node-a` `jps` Output @@ -602,7 +598,7 @@ $ jps .`node-a` `jps` Output ==== ---- -$ jps +$ jps 13901 Jps 13639 HQuorumPeer 13737 HRegionServer http://git-wip-us.apache.org/repos/asf/hbase/blob/7139c90e/src/main/asciidoc/_chapters/hbase_apis.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbase_apis.adoc b/src/main/asciidoc/_chapters/hbase_apis.adoc index 1fed946..85dbad1 100644 --- a/src/main/asciidoc/_chapters/hbase_apis.adoc +++ b/src/main/asciidoc/_chapters/hbase_apis.adoc @@ -28,22 +28,19 @@ :experimental: This chapter provides information about performing operations using HBase native APIs. -This information is not exhaustive, and provides a quick reference in addition to the link:http://hbase.apache.org/apidocs/index.html[User API - Reference]. +This information is not exhaustive, and provides a quick reference in addition to the link:http://hbase.apache.org/apidocs/index.html[User API Reference]. The examples here are not comprehensive or complete, and should be used for purposes of illustration only. Apache HBase also works with multiple external APIs. -See <<external_apis,external apis>> for more information. +See <<external_apis>> for more information. == Examples .Create a Table Using Java ==== -This example has been tested on HBase 0.96.1.1. [source,java] ---- - package com.example.hbase.admin; import java.io.IOException; @@ -52,7 +49,7 @@ import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.TableName; -import org.apache.hadoop.hbase.client.HBaseAdmin; +import org.apache.hadoop.hbase.client.Admin; import org.apache.hadoop.hbase.io.compress.Compression.Algorithm; import org.apache.hadoop.conf.Configuration; @@ -60,7 +57,7 @@ import static com.example.hbase.Constants.*; public class CreateSchema { - public static void createOrOverwrite(HBaseAdmin admin, HTableDescriptor table) throws IOException { + public static void createOrOverwrite(Admin admin, HTableDescriptor table) throws IOException { if (admin.tableExists(table.getName())) { admin.disableTable(table.getName()); admin.deleteTable(table.getName()); @@ -70,7 +67,7 @@ public class CreateSchema { public static void createSchemaTables (Configuration config) { try { - final HBaseAdmin admin = new HBaseAdmin(config); + final Admin admin = new Admin(config); HTableDescriptor table = new HTableDescriptor(TableName.valueOf(TABLE_NAME)); table.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.SNAPPY)); @@ -85,53 +82,57 @@ public class CreateSchema { } } - } ---- ==== .Add, Modify, and Delete a Table ==== -This example has been tested on HBase 0.96.1.1. [source,java] ---- - public static void upgradeFrom0 (Configuration config) { - try { - final HBaseAdmin admin = new HBaseAdmin(config); - TableName tableName = TableName.valueOf(TABLE_ASSETMETA); - HTableDescriptor table_assetmeta = new HTableDescriptor(tableName); - table_assetmeta.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.SNAPPY)); + try { + final Admin admin = new Admin(config); + TableName tableName = TableName.valueOf(TABLE_ASSETMETA); + HTableDescriptor table_assetmeta = new HTableDescriptor(tableName); + table_assetmeta.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.SNAPPY)); - // Create a new table. + // Create a new table. - System.out.print("Creating table_assetmeta. "); - admin.createTable(table_assetmeta); - System.out.println(" Done."); + System.out.print("Creating table_assetmeta. "); + admin.createTable(table_assetmeta); + System.out.println(" Done."); - // Update existing table - HColumnDescriptor newColumn = new HColumnDescriptor("NEWCF"); - newColumn.setCompactionCompressionType(Algorithm.GZ); - newColumn.setMaxVersions(HConstants.ALL_VERSIONS); - admin.addColumn(tableName, newColumn); + // Update existing table + HColumnDescriptor newColumn = new HColumnDescriptor("NEWCF"); + newColumn.setCompactionCompressionType(Algorithm.GZ); + newColumn.setMaxVersions(HConstants.ALL_VERSIONS); + admin.addColumn(tableName, newColumn); - // Disable an existing table - admin.disableTable(tableName); + // Update existing column family + HColumnDescriptor existingColumn = new HColumnDescriptor(CF_DEFAULT); + existingColumn.setCompactionCompressionType(Algorithm.GZ); + existingColumn.setMaxVersions(HConstants.ALL_VERSIONS); + table_assetmeta.modifyFamily(existingColumn) + admin.modifyTable(tableName, table_assetmeta); - // Delete an existing column family - admin.deleteColumn(tableName, CF_DEFAULT); + // Disable an existing table + admin.disableTable(tableName); - // Delete a table (Need to be disabled first) - admin.deleteTable(tableName); + // Delete an existing column family + admin.deleteColumn(tableName, CF_DEFAULT); + // Delete a table (Need to be disabled first) + admin.deleteTable(tableName); - admin.close(); - } catch (Exception e) { - e.printStackTrace(); - System.exit(-1); - } + + admin.close(); + } catch (Exception e) { + e.printStackTrace(); + System.exit(-1); } +} ---- ====