Updated documentation from master
Project: http://git-wip-us.apache.org/repos/asf/hbase/repo Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/15886909 Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/15886909 Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/15886909 Branch: refs/heads/branch-1.0 Commit: 15886909ca86d31b0754bcca0261bc1a1ae76d8b Parents: 9100b91 Author: Enis Soztutar <e...@apache.org> Authored: Tue Jul 7 11:49:01 2015 -0700 Committer: Enis Soztutar <e...@apache.org> Committed: Tue Jul 7 11:49:07 2015 -0700 ---------------------------------------------------------------------- src/main/asciidoc/_chapters/architecture.adoc | 166 ++++++++++++---- src/main/asciidoc/_chapters/configuration.adoc | 38 ++-- src/main/asciidoc/_chapters/developer.adoc | 14 +- .../asciidoc/_chapters/getting_started.adoc | 8 +- src/main/asciidoc/_chapters/hbase-default.adoc | 15 -- src/main/asciidoc/_chapters/ops_mgt.adoc | 197 ++++++++++++++++++- src/main/asciidoc/_chapters/schema_design.adoc | 16 +- src/main/asciidoc/_chapters/tracing.adoc | 43 ++-- src/main/asciidoc/_chapters/zookeeper.adoc | 27 +-- 9 files changed, 393 insertions(+), 131 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/architecture.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc index 659c4ee..bee1c16 100644 --- a/src/main/asciidoc/_chapters/architecture.adoc +++ b/src/main/asciidoc/_chapters/architecture.adoc @@ -2222,18 +2222,6 @@ The region replica having replica_id==0 is called the primary region, and the ot Only the primary can accept writes from the client, and the primary will always contain the latest changes. Since all writes still have to go through the primary region, the writes are not highly-available (meaning they might block for some time if the region becomes unavailable). -The writes are asynchronously sent to the secondary region replicas using an _Async WAL replication_ feature. -This works similarly to HBase's multi-datacenter replication, but instead the data from a region is replicated to the secondary regions. -Each secondary replica always receives and observes the writes in the same order that the primary region committed them. -This ensures that the secondaries won't diverge from the primary regions data, but since the log replication is asnyc, the data might be stale in secondary regions. -In some sense, this design can be thought of as _in-cluster replication_, where instead of replicating to a different datacenter, the data goes to a secondary region to keep secondary region's in-memory state up to date. -The data files are shared between the primary region and the other replicas, so that there is no extra storage overhead. -However, the secondary regions will have recent non-flushed data in their MemStores, which increases the memory overhead. - - -Async WAL replication feature is being implemented in Phase 2 of issue HBASE-10070. -Before this, region replicas will only be updated with flushed data files from the primary (see `hbase.regionserver.storefile.refresh.period` below). It is also possible to use this without setting `storefile.refresh.period` for read only tables. - === Timeline Consistency @@ -2273,8 +2261,8 @@ In terms of semantics, TIMELINE consistency as implemented by HBase differs from There is no stickiness to region replicas or a transaction-id based guarantee. If required, this can be implemented later though. -.HFile Version 1 -image::timeline_consistency.png[HFile Version 1] +.Timeline Consistency +image::timeline_consistency.png[Timeline Consistency] To better understand the TIMELINE semantics, lets look at the above diagram. Lets say that there are two clients, and the first one writes x=1 at first, then x=2 and x=3 later. @@ -2309,11 +2297,52 @@ Following are advantages and disadvantages. To serve the region data from multiple replicas, HBase opens the regions in secondary mode in the region servers. The regions opened in secondary mode will share the same data files with the primary region replica, however each secondary region replica will have its own MemStore to keep the unflushed data (only primary region can do flushes). Also to serve reads from secondary regions, the blocks of data files may be also cached in the block caches for the secondary regions. +=== Where is the code +This feature is delivered in two phases, Phase 1 and 2. The first phase is done in time for HBase-1.0.0 release. Meaning that using HBase-1.0.x, you can use all the features that are marked for Phase 1. Phase 2 is committed in HBase-1.1.0, meaning all HBase versions after 1.1.0 should contain Phase 2 items. + +=== Propagating writes to region replicas +As discussed above writes only go to the primary region replica. For propagating the writes from the primary region replica to the secondaries, there are two different mechanisms. For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas. For mutable tables, you have to use *only* one of the following mechanisms: storefile refresher, or async wal replication. The latter is recommeded. + +==== StoreFile Refresher +The first mechanism is store file refresher which is introduced in HBase-1.0+. Store file refresher is a thread per region server, which runs periodically, and does a refresh operation for the store files of the primary region for the secondary region replicas. If enabled, the refresher will ensure that the secondary region replicas see the new flushed, compacted or bulk loaded files from the primary region in a timely manner. However, this means that only flushed data can be read back from the secondary region replicas, and after the refresher is run, making the secondaries lag behind the primary for an a longer time. + +For turning this feature on, you should configure `hbase.regionserver.storefile.refresh.period` to a non-zero value. See Configuration section below. + +==== Asnyc WAL replication +The second mechanism for propagation of writes to secondaries is done via âAsync WAL Replicationâ feature and is only available in HBase-1.1+. This works similarly to HBaseâs multi-datacenter replication, but instead the data from a region is replicated to the secondary regions. Each secondary replica always receives and observes the writes in the same order that the primary region committed them. In some sense, this design can be thought of as âin-cluster replicationâ, where instead of replicating to a different datacenter, the data goes to secondary regions to keep secondary regionâs in-memory state up to date. The data files are shared between the primary region and the other replicas, so that there is no extra storage overhead. However, the secondary regions will have recent non-flushed data in their memstores, which increases the memory overhead. The primary region writes flush, compaction, and bulk load events to its WAL as well, which are also replicated through w al replication to secondaries. When they observe the flush/compaction or bulk load event, the secondary regions replay the event to pick up the new files and drop the old ones. + +Committing writes in the same order as in primary ensures that the secondaries wonât diverge from the primary regions data, but since the log replication is asynchronous, the data might still be stale in secondary regions. Since this feature works as a replication endpoint, the performance and latency characteristics is expected to be similar to inter-cluster replication. + +Async WAL Replication is *disabled* by default. You can enable this feature by setting `hbase.region.replica.replication.enabled` to `true`. +Asyn WAL Replication feature will add a new replication peer named `region_replica_replication` as a replication peer when you create a table with region replication > 1 for the first time. Once enabled, if you want to disable this feature, you need to do two actions: +* Set configuration property `hbase.region.replica.replication.enabled` to false in `hbase-site.xml` (see Configuration section below) +* Disable the replication peer named `region_replica_replication` in the cluster using hbase shell or `ReplicationAdmin` class: +[source,bourne] +---- + hbase> disable_peer 'region_replica_replication' +---- + +=== Store File TTL +In both of the write propagation approaches mentioned above, store files of the primary will be opened in secondaries independent of the primary region. So for files that the primary compacted away, the secondaries might still be referring to these files for reading. Both features are using HFileLinks to refer to files, but there is no protection (yet) for guaranteeing that the file will not be deleted prematurely. Thus, as a guard, you should set the configuration property `hbase.master.hfilecleaner.ttl` to a larger value, such as 1 hour to guarantee that you will not receive IOExceptions for requests going to replicas. + +=== Region replication for META tableâs region +Currently, Async WAL Replication is not done for the META tableâs WAL. The meta tableâs secondary replicas still refreshes themselves from the persistent store files. Hence the `hbase.regionserver.meta.storefile.refresh.period` needs to be set to a certain non-zero value for refreshing the meta store files. Note that this configuration is configured differently than +`hbase.regionserver.storefile.refresh.period`. + +=== Memory accounting +The secondary region replicas refer to the data files of the primary region replica, but they have their own memstores (in HBase-1.1+) and uses block cache as well. However, one distinction is that the secondary region replicas cannot flush the data when there is memory pressure for their memstores. They can only free up memstore memory when the primary region does a flush and this flush is replicated to the secondary. Since in a region server hosting primary replicas for some regions and secondaries for some others, the secondaries might cause extra flushes to the primary regions in the same host. In extreme situations, there can be no memory left for adding new writes coming from the primary via wal replication. For unblocking this situation (and since secondary cannot flush by itself), the secondary is allowed to do a âstore file refreshâ by doing a file system list operation to pick up new files from primary, and possibly dropping its memstore. This refresh will only be perf ormed if the memstore size of the biggest secondary region replica is at least `hbase.region.replica.storefile.refresh.memstore.multiplier` (default 4) times bigger than the biggest memstore of a primary replica. One caveat is that if this is performed, the secondary can observe partial row updates across column families (since column families are flushed independently). The default should be good to not do this operation frequently. You can set this value to a large number to disable this feature if desired, but be warned that it might cause the replication to block forever. + +=== Secondary replica failover +When a secondary region replica first comes online, or fails over, it may have served some edits from itâs memstore. Since the recovery is handled differently for secondary replicas, the secondary has to ensure that it does not go back in time before it starts serving requests after assignment. For doing that, the secondary waits until it observes a full flush cycle (start flush, commit flush) or a âregion open eventâ replicated from the primary. Until this happens, the secondary region replica will reject all read requests by throwing an IOException with message âThe region's reads are disabledâ. However, the other replicas will probably still be available to read, thus not causing any impact for the rpc with TIMELINE consistency. To facilitate faster recovery, the secondary region will trigger a flush request from the primary when it is opened. The configuration property `hbase.region.replica.wait.for.primary.flush` (enabled by default) can be used to disable this featur e if needed. + + + + === Configuration properties -To use highly available reads, you should set the following properties in hbase-site.xml file. +To use highly available reads, you should set the following properties in `hbase-site.xml` file. There is no specific configuration to enable or disable region replicas. -Instead you can change the number of region replicas per table to increase or decrease at the table creation or with alter table. +Instead you can change the number of region replicas per table to increase or decrease at the table creation or with alter table. The following configuration is for using async wal replication and using meta replicas of 3. ==== Server side properties @@ -2321,11 +2350,27 @@ Instead you can change the number of region replicas per table to increase or de [source,xml] ---- <property> - <name>hbase.regionserver.storefile.refresh.period</name> - <value>0</value> - <description> - The period (in milliseconds) for refreshing the store files for the secondary regions. 0 means this feature is disabled. Secondary regions sees new files (from flushes and compactions) from primary once the secondary region refreshes the list of files in the region. But too frequent refreshes might cause extra Namenode pressure. If the files cannot be refreshed for longer than HFile TTL (hbase.master.hfilecleaner.ttl) the requests are rejected. Configuring HFile TTL to a larger value is also recommended with this setting. - </description> + <name>hbase.regionserver.storefile.refresh.period</name> + <value>0</value> + <description> + The period (in milliseconds) for refreshing the store files for the secondary regions. 0 means this feature is disabled. Secondary regions sees new files (from flushes and compactions) from primary once the secondary region refreshes the list of files in the region (there is no notification mechanism). But too frequent refreshes might cause extra Namenode pressure. If the files cannot be refreshed for longer than HFile TTL (hbase.master.hfilecleaner.ttl) the requests are rejected. Configuring HFile TTL to a larger value is also recommended with this setting. + </description> +</property> + +<property> + <name>hbase.regionserver.meta.storefile.refresh.period</name> + <value>300000</value> + <description> + The period (in milliseconds) for refreshing the store files for the hbase:meta tables secondary regions. 0 means this feature is disabled. Secondary regions sees new files (from flushes and compactions) from primary once the secondary region refreshes the list of files in the region (there is no notification mechanism). But too frequent refreshes might cause extra Namenode pressure. If the files cannot be refreshed for longer than HFile TTL (hbase.master.hfilecleaner.ttl) the requests are rejected. Configuring HFile TTL to a larger value is also recommended with this setting. This should be a non-zero number if meta replicas are enabled (via hbase.meta.replica.count set to greater than 1). + </description> +</property> + +<property> + <name>hbase.region.replica.replication.enabled</name> + <value>true</value> + <description> + Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutatations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication. So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication"· to true for this feature to work. + </description> </property> <property> <name>hbase.region.replica.replication.memstore.enabled</name> @@ -2341,6 +2386,38 @@ Instead you can change the number of region replicas per table to increase or de of row-level consistency, even when the read requests `Consistency.TIMELINE`. </description> </property> + +<property> + <name>hbase.master.hfilecleaner.ttl</name> + <value>3600000</value> + <description> + The period (in milliseconds) to keep store files in the archive folder before deleting them from the file system.</description> +</property> + +<property> + <name>hbase.meta.replica.count</name> + <value>3</value> + <description> + Region replication count for the meta regions. Defaults to 1. + </description> +</property> + + +<property> + <name>hbase.region.replica.storefile.refresh.memstore.multiplier</name> + <value>4</value> + <description> + The multiplier for a âstore file refreshâ operation for the secondary region replica. If a region server has memory pressure, the secondary region will refresh itâs store files if the memstore size of the biggest secondary replica is bigger this many times than the memstore size of the biggest primary replica. Set this to a very big value to disable this feature (not recommended). + </description> +</property> + +<property> + <name>hbase.region.replica.wait.for.primary.flush</name> + <value>true</value> + <description> + Whether to wait for observing a full flush cycle from the primary before start serving data in a secondary. Disabling this might cause the secondary region replicas to go back in time for reads between region movements. + </description> +</property> ---- One thing to keep in mind also is that, region replica placement policy is only enforced by the `StochasticLoadBalancer` which is the default balancer. @@ -2353,11 +2430,11 @@ Ensure to set the following for all clients (and servers) that will use region r [source,xml] ---- <property> - <name>hbase.ipc.client.allowsInterrupt</name> - <value>true</value> - <description> - Whether to enable interruption of RPC threads at the client side. This is required for region replicas with fallback RPCâs to secondary regions. - </description> + <name>hbase.ipc.client.specificThreadForWriting</name> + <value>true</value> + <description> + Whether to enable interruption of RPC threads at the client side. This is required for region replicas with fallback RPCâs to secondary regions. + </description> </property> <property> <name>hbase.client.primaryCallTimeout.get</name> @@ -2380,13 +2457,29 @@ Ensure to set the following for all clients (and servers) that will use region r The timeout (in microseconds), before secondary fallback RPCâs are submitted for scan requests with Consistency.TIMELINE to the secondary replicas of the regions. Defaults to 1 sec. Setting this lower will increase the number of RPCâs, but will lower the p99 latencies. </description> </property> +<property> + <name>hbase.meta.replicas.use</name> + <value>true</value> + <description> + Whether to use meta table replicas or not. Default is false. + </description> +</property> ---- +Note HBase-1.0.x users should use `hbase.ipc.client.allowsInterrupt` rather than `hbase.ipc.client.specificThreadForWriting`. + +=== User Interface + +In the masters user interface, the region replicas of a table are also shown together with the primary regions. +You can notice that the replicas of a region will share the same start and end keys and the same region name prefix. +The only difference would be the appended replica_id (which is encoded as hex), and the region encoded name will be different. +You can also see the replica ids shown explicitly in the UI. + === Creating a table with region replication Region replication is a per-table property. -All tables have REGION_REPLICATION = 1 by default, which means that there is only one replica per region. -You can set and change the number of replicas per region of a table by supplying the REGION_REPLICATION property in the table descriptor. +All tables have `REGION_REPLICATION = 1` by default, which means that there is only one replica per region. +You can set and change the number of replicas per region of a table by supplying the `REGION_REPLICATION` property in the table descriptor. ==== Shell @@ -2414,21 +2507,8 @@ admin.createTable(htd); You can also use `setRegionReplication()` and alter table to increase, decrease the region replication for a table. -=== Region splits and merges - -Region splits and merges are not compatible with regions with replicas yet. -So you have to pre-split the table, and disable the region splits. -Also you should not execute region merges on tables with region replicas. -To disable region splits you can use DisabledRegionSplitPolicy as the split policy. - -=== User Interface - -In the masters user interface, the region replicas of a table are also shown together with the primary regions. -You can notice that the replicas of a region will share the same start and end keys and the same region name prefix. -The only difference would be the appended replica_id (which is encoded as hex), and the region encoded name will be different. -You can also see the replica ids shown explicitly in the UI. -=== API and Usage +=== Read API and Usage ==== Shell @@ -2490,7 +2570,7 @@ scan.setConsistency(Consistency.TIMELINE); ResultScanner scanner = table.getScanner(scan); ---- -You can inspect whether the results are coming from primary region or not by calling the Result.isStale() method: +You can inspect whether the results are coming from primary region or not by calling the `Result.isStale()` method: [source,java] ---- http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/configuration.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc index 01f2eb7..35ce042 100644 --- a/src/main/asciidoc/_chapters/configuration.adoc +++ b/src/main/asciidoc/_chapters/configuration.adoc @@ -98,6 +98,11 @@ This section lists required services and some required system configuration. |JDK 7 |JDK 8 +|1.2 +|link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported] +|yes +|yes + |1.1 |link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported] |yes @@ -116,11 +121,6 @@ deprecated `remove()` method of the `PoolMap` class and is under consideration. link:https://issues.apache.org/jira/browse/HBASE-7608[HBASE-7608] for more information about JDK 8 support. -|0.96 -|yes -|yes -|N/A - |0.94 |yes |yes @@ -210,22 +210,20 @@ Use the following legend to interpret this table: * "X" = not supported * "NT" = Not tested -[cols="1,1,1,1,1,1,1", options="header"] +[cols="1,1,1,1,1,1", options="header"] |=== -| | HBase-0.92.x | HBase-0.94.x | HBase-0.96.x | HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.) | HBase-1.0.x (Hadoop 1.x is NOT supported) | HBase-1.1.x -|Hadoop-0.20.205 | S | X | X | X | X | X -|Hadoop-0.22.x | S | X | X | X | X | X -|Hadoop-1.0.x |X | X | X | X | X | X -|Hadoop-1.1.x | NT | S | S | NT | X | X -|Hadoop-0.23.x | X | S | NT | X | X | X -|Hadoop-2.0.x-alpha | X | NT | X | X | X | X -|Hadoop-2.1.0-beta | X | NT | S | X | X | X -|Hadoop-2.2.0 | X | NT | S | S | NT | NT -|Hadoop-2.3.x | X | NT | S | S | NT | NT -|Hadoop-2.4.x | X | NT | S | S | S | S -|Hadoop-2.5.x | X | NT | S | S | S | S -|Hadoop-2.6.x | X | NT | NT | NT | S | S -|Hadoop-2.7.x | X | NT | NT | NT | NT | NT +| | HBase-0.94.x | HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.) | HBase-1.0.x (Hadoop 1.x is NOT supported) | HBase-1.1.x | HBase-1.2.x +|Hadoop-1.0.x | X | X | X | X | X +|Hadoop-1.1.x | S | NT | X | X | X +|Hadoop-0.23.x | S | X | X | X | X +|Hadoop-2.0.x-alpha | NT | X | X | X | X +|Hadoop-2.1.0-beta | NT | X | X | X | X +|Hadoop-2.2.0 | NT | S | NT | NT | NT +|Hadoop-2.3.x | NT | S | NT | NT | NT +|Hadoop-2.4.x | NT | S | S | S | S +|Hadoop-2.5.x | NT | S | S | S | S +|Hadoop-2.6.x | NT | NT | S | S | S +|Hadoop-2.7.x | NT | NT | NT | NT | NT |=== .Replace the Hadoop Bundled With HBase! http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/developer.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/developer.adoc b/src/main/asciidoc/_chapters/developer.adoc index ee03614..a4c1dd4 100644 --- a/src/main/asciidoc/_chapters/developer.adoc +++ b/src/main/asciidoc/_chapters/developer.adoc @@ -456,6 +456,7 @@ You then reference these generated poms when you build. For now, just be aware of the difference between HBase 1.x builds and those of HBase 0.96-0.98. This difference is important to the build instructions. +[[maven.settings.xml]] .Example _~/.m2/settings.xml_ File ==== Publishing to maven requires you sign the artifacts you want to upload. @@ -637,12 +638,12 @@ Release needs to be tagged for the next step. . Deploy to the Maven Repository. + Next, deploy HBase to the Apache Maven repository, using the `apache-release` profile instead of the `release` profile when running the `mvn deploy` command. -This profile invokes the Apache pom referenced by our pom files, and also signs your artifacts published to Maven, as long as the _settings.xml_ is configured correctly, as described in <<mvn.settings.file,mvn.settings.file>>. +This profile invokes the Apache pom referenced by our pom files, and also signs your artifacts published to Maven, as long as the _settings.xml_ is configured correctly, as described in <<maven.settings.xml>>. + [source,bourne] ---- -$ mvn deploy -DskipTests -Papache-release +$ mvn deploy -DskipTests -Papache-release -Prelease ---- + This command copies all artifacts up to a temporary staging Apache mvn repository in an 'open' state. @@ -720,7 +721,7 @@ Announce the release candidate on the mailing list and call a vote. [[maven.snapshot]] === Publishing a SNAPSHOT to maven -Make sure your _settings.xml_ is set up properly, as in <<mvn.settings.file,mvn.settings.file>>. +Make sure your _settings.xml_ is set up properly (see <<maven.settings.xml>>). Make sure the hbase version includes `-SNAPSHOT` as a suffix. Following is an example of publishing SNAPSHOTS of a release that had an hbase version of 0.96.0 in its poms. @@ -1335,6 +1336,13 @@ NOTE: End-of-life releases are not included in this list. | 1.0 | Enis Soztutar + +| 1.1 +| Nick Dimiduk + +| 1.2 +| Sean Busbey + |=== [[code.standards]] http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/getting_started.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/getting_started.adoc b/src/main/asciidoc/_chapters/getting_started.adoc index 41674a0..7839bad 100644 --- a/src/main/asciidoc/_chapters/getting_started.adoc +++ b/src/main/asciidoc/_chapters/getting_started.adoc @@ -294,9 +294,11 @@ You can skip the HDFS configuration to continue storing your data in the local f .Hadoop Configuration [NOTE] ==== -This procedure assumes that you have configured Hadoop and HDFS on your local system and or a remote system, and that they are running and available. -It also assumes you are using Hadoop 2. -Currently, the documentation on the Hadoop website does not include a quick start for Hadoop 2, but the guide at link:http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide is a good starting point. +This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote +system, and that they are running and available. It also assumes you are using Hadoop 2. +The guide on +link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html[Setting up a Single Node Cluster] +in the Hadoop documentation is a good starting point. ==== http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/hbase-default.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc b/src/main/asciidoc/_chapters/hbase-default.adoc index bf56dd3..8df9b17 100644 --- a/src/main/asciidoc/_chapters/hbase-default.adoc +++ b/src/main/asciidoc/_chapters/hbase-default.adoc @@ -605,21 +605,7 @@ Instructs HBase to make use of ZooKeeper's multi-update functionality. .Default `true` - -[[hbase.config.read.zookeeper.config]] -*`hbase.config.read.zookeeper.config`*:: -+ -.Description - - Set to true to allow HBaseConfiguration to read the - zoo.cfg file for ZooKeeper properties. Switching this to true - is not recommended, since the functionality of reading ZK - properties from a zoo.cfg file has been deprecated. -+ -.Default -`false` - [[hbase.zookeeper.property.initLimit]] *`hbase.zookeeper.property.initLimit`*:: + @@ -2251,4 +2237,3 @@ The percent of region server RPC threads failed to abort RS. .Default `0.5` - \ No newline at end of file http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/ops_mgt.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc index b8018b6..514003d 100644 --- a/src/main/asciidoc/_chapters/ops_mgt.adoc +++ b/src/main/asciidoc/_chapters/ops_mgt.adoc @@ -1311,13 +1311,13 @@ list_peers:: list all replication relationships known by this cluster enable_peer <ID>:: Enable a previously-disabled replication relationship disable_peer <ID>:: - Disable a replication relationship. HBase will no longer send edits to that peer cluster, but it still keeps track of all the new WALs that it will need to replicate if and when it is re-enabled. + Disable a replication relationship. HBase will no longer send edits to that peer cluster, but it still keeps track of all the new WALs that it will need to replicate if and when it is re-enabled. remove_peer <ID>:: Disable and remove a replication relationship. HBase will no longer send edits to that peer cluster or keep track of WALs. enable_table_replication <TABLE_NAME>:: - Enable the table replication switch for all it's column families. If the table is not found in the destination cluster then it will create one with the same name and column families. + Enable the table replication switch for all it's column families. If the table is not found in the destination cluster then it will create one with the same name and column families. disable_table_replication <TABLE_NAME>:: - Disable the table replication switch for all it's column families. + Disable the table replication switch for all it's column families. === Verifying Replicated Data @@ -1609,6 +1609,197 @@ You can use the HBase Shell command `status 'replication'` to monitor the replic * `status 'replication', 'source'` -- prints the status for each replication source, sorted by hostname. * `status 'replication', 'sink'` -- prints the status for each replication sink, sorted by hostname. +== Running Multiple Workloads On a Single Cluster + +HBase provides the following mechanisms for managing the performance of a cluster +handling multiple workloads: +. <<quota>> +. <<request-queues>> +. <<multiple-typed-queues>> + +[[quota]] +=== Quotas +HBASE-11598 introduces quotas, which allow you to throttle requests based on +the following limits: + +. <<request-quotas,The number or size of requests(read, write, or read+write) in a given timeframe>> +. <<namespace-quotas,The number of tables allowed in a namespace>> + +These limits can be enforced for a specified user, table, or namespace. + +.Enabling Quotas + +Quotas are disabled by default. To enable the feature, set the `hbase.quota.enabled` +property to `true` in _hbase-site.xml_ file for all cluster nodes. + +.General Quota Syntax +. THROTTLE_TYPE can be expressed as READ, WRITE, or the default type(read + write). +. Timeframes can be expressed in the following units: `sec`, `min`, `hour`, `day` +. Request sizes can be expressed in the following units: `B` (bytes), `K` (kilobytes), +`M` (megabytes), `G` (gigabytes), `T` (terabytes), `P` (petabytes) +. Numbers of requests are expressed as an integer followed by the string `req` +. Limits relating to time are expressed as req/time or size/time. For instance `10req/day` +or `100P/hour`. +. Numbers of tables or regions are expressed as integers. + +[[request-quotas]] +.Setting Request Quotas +You can set quota rules ahead of time, or you can change the throttle at runtime. The change +will propagate after the quota refresh period has expired. This expiration period +defaults to 5 minutes. To change it, modify the `hbase.quota.refresh.period` property +in `hbase-site.xml`. This property is expressed in milliseconds and defaults to `300000`. + +---- +# Limit user u1 to 10 requests per second +hbase> set_quota TYPE => THROTTLE, USER => 'u1', LIMIT => '10req/sec' + +# Limit user u1 to 10 read requests per second +hbase> set_quota TYPE => THROTTLE, THROTTLE_TYPE => READ, USER => 'u1', LIMIT => '10req/sec' + +# Limit user u1 to 10 M per day everywhere +hbase> set_quota TYPE => THROTTLE, USER => 'u1', LIMIT => '10M/day' + +# Limit user u1 to 10 M write size per sec +hbase> set_quota TYPE => THROTTLE, THROTTLE_TYPE => WRITE, USER => 'u1', LIMIT => '10M/sec' + +# Limit user u1 to 5k per minute on table t2 +hbase> set_quota TYPE => THROTTLE, USER => 'u1', TABLE => 't2', LIMIT => '5K/min' + +# Limit user u1 to 10 read requests per sec on table t2 +hbase> set_quota TYPE => THROTTLE, THROTTLE_TYPE => READ, USER => 'u1', TABLE => 't2', LIMIT => '10req/sec' + +# Remove an existing limit from user u1 on namespace ns2 +hbase> set_quota TYPE => THROTTLE, USER => 'u1', NAMESPACE => 'ns2', LIMIT => NONE + +# Limit all users to 10 requests per hour on namespace ns1 +hbase> set_quota TYPE => THROTTLE, NAMESPACE => 'ns1', LIMIT => '10req/hour' + +# Limit all users to 10 T per hour on table t1 +hbase> set_quota TYPE => THROTTLE, TABLE => 't1', LIMIT => '10T/hour' + +# Remove all existing limits from user u1 +hbase> set_quota TYPE => THROTTLE, USER => 'u1', LIMIT => NONE + +# List all quotas for user u1 in namespace ns2 +hbase> list_quotas USER => 'u1, NAMESPACE => 'ns2' + +# List all quotas for namespace ns2 +hbase> list_quotas NAMESPACE => 'ns2' + +# List all quotas for table t1 +hbase> list_quotas TABLE => 't1' + +# list all quotas +hbase> list_quotas +---- + +You can also place a global limit and exclude a user or a table from the limit by applying the +`GLOBAL_BYPASS` property. +---- +hbase> set_quota NAMESPACE => 'ns1', LIMIT => '100req/min' # a per-namespace request limit +hbase> set_quota USER => 'u1', GLOBAL_BYPASS => true # user u1 is not affected by the limit +---- + +[[namespace_quotas]] +.Setting Namespace Quotas + +You can specify the maximum number of tables or regions allowed in a given namespace, either +when you create the namespace or by altering an existing namespace, by setting the +`hbase.namespace.quota.maxtables property` on the namespace. + +.Limiting Tables Per Namespace +---- +# Create a namespace with a max of 5 tables +hbase> create_namespace 'ns1', {'hbase.namespace.quota.maxtables'=>'5'} + +# Alter an existing namespace to have a max of 8 tables +hbase> alter_namespace 'ns2', {METHOD => 'set', 'hbase.namespace.quota.maxtables'=>'8'} + +# Show quota information for a namespace +hbase> describe_namespace 'ns2' + +# Alter an existing namespace to remove a quota +hbase> alter_namespace 'ns2', {METHOD => 'unset', NAME=>'hbase.namespace.quota.maxtables'} +---- + +.Limiting Regions Per Namespace +---- +# Create a namespace with a max of 10 regions +hbase> create_namespace 'ns1', {'hbase.namespace.quota.maxregions'=>'10' + +# Show quota information for a namespace +hbase> describe_namespace 'ns1' + +# Alter an existing namespace to have a max of 20 tables +hbase> alter_namespace 'ns2', {METHOD => 'set', 'hbase.namespace.quota.maxregions'=>'20'} + +# Alter an existing namespace to remove a quota +hbase> alter_namespace 'ns2', {METHOD => 'unset', NAME=> 'hbase.namespace.quota.maxregions'} +---- + +[[request_queues]] +=== Request Queues +If no throttling policy is configured, when the RegionServer receives multiple requests, +they are now placed into a queue waiting for a free execution slot (HBASE-6721). +The simplest queue is a FIFO queue, where each request waits for all previous requests in the queue +to finish before running. Fast or interactive queries can get stuck behind large requests. + +If you are able to guess how long a request will take, you can reorder requests by +pushing the long requests to the end of the queue and allowing short requests to preempt +them. Eventually, you must still execute the large requests and prioritize the new +requests behind them. The short requests will be newer, so the result is not terrible, +but still suboptimal compared to a mechanism which allows large requests to be split +into multiple smaller ones. + +HBASE-10993 introduces such a system for deprioritizing long-running scanners. There +are two types of queues,`fifo` and `deadline`.To configure the type of queue used, +configure the `hbase.ipc.server.callqueue.type` property in `hbase-site.xml`. There +is no way to estimate how long each request may take, so de-prioritization only affects +scans, and is based on the number of ânextâ calls a scan request has made. An assumption +is made that when you are doing a full table scan, your job is not likely to be interactive, +so if there are concurrent requests, you can delay long-running scans up to a limit tunable by +setting the `hbase.ipc.server.queue.max.call.delay` property. The slope of the delay is calculated +by a simple square root of `(numNextCall * weight)` where the weight is +configurable by setting the `hbase.ipc.server.scan.vtime.weight` property. + +[[multiple-typed-queues]] +=== Multiple-Typed Queues + +You can also prioritize or deprioritize different kinds of requests by configuring +a specified number of dedicated handlers and queues. You can segregate the scan requests +in a single queue with a single handler, and all the other available queues can service +short `Get` requests. + +You can adjust the IPC queues and handlers based on the type of workload, using static +tuning options. This approach is an interim first step that will eventually allow +you to change the settings at runtime, and to dynamically adjust values based on the load. + +.Multiple Queues + +To avoid contention and separate different kinds of requests, configure the +`hbase.ipc.server.callqueue.handler.factor` property, which allows you to increase the number of +queues and control how many handlers can share the same queue., allows admins to increase the number +of queues and decide how many handlers share the same queue. + +Using more queues reduces contention when adding a task to a queue or selecting it +from a queue. You can even configure one queue per handler. The trade-off is that +if some queues contain long-running tasks, a handler may need to wait to execute from that queue +rather than stealing from another queue which has waiting tasks. + +.Read and Write Queues +With multiple queues, you can now divide read and write requests, giving more priority +(more queues) to one or the other type. Use the `hbase.ipc.server.callqueue.read.ratio` +property to choose to serve more reads or more writes. + +.Get and Scan Queues +Similar to the read/write split, you can split gets and scans by tuning the `hbase.ipc.server.callqueue.scan.ratio` +property to give more priority to gets or to scans. A scan ratio of `0.1` will give +more queue/handlers to the incoming gets, which means that more gets can be processed +at the same time and that fewer scans can be executed at the same time. A value of +`0.9` will give more queue/handlers to scans, so the number of scans executed will +increase and the number of gets will decrease. + + [[ops.backup]] == HBase Backup http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/schema_design.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc index 9319c65..a212a5c 100644 --- a/src/main/asciidoc/_chapters/schema_design.adoc +++ b/src/main/asciidoc/_chapters/schema_design.adoc @@ -27,7 +27,19 @@ :icons: font :experimental: -A good general introduction on the strength and weaknesses modelling on the various non-rdbms datastores is Ian Varley's Master thesis, link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases]. Also, read <<keyvalue,keyvalue>> for how HBase stores data internally, and the section on <<schema.casestudies,schema.casestudies>>. +A good introduction on the strength and weaknesses modelling on the various non-rdbms datastores is +to be found in Ian Varley's Master thesis, +link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases]. +It is a little dated now but a good background read if you have a moment on how HBase schema modeling +differs from how it is done in an RDBMS. Also, +read <<keyvalue,keyvalue>> for how HBase stores data internally, and the section on <<schema.casestudies,schema.casestudies>>. + +The documentation on the Cloud Bigtable website, link:https://cloud.google.com/bigtable/docs/schema-design[Designing Your Schema], +is pertinent and nicely done and lessons learned there equally apply here in HBase land; just divide +any quoted values by ~10 to get what works for HBase: e.g. where it says individual values can be ~10MBs in size, HBase can do similar -- perhaps best +to go smaller if you can -- and where it says a maximum of 100 column families in Cloud Bigtable, think ~10 when +modeling on HBase. + [[schema.creation]] == Schema Creation @@ -700,7 +712,7 @@ See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#se ==== [[schema.casestudies.log_timeseries.varkeys]] -==== Variangle Length or Fixed Length Rowkeys? +==== Variable Length or Fixed Length Rowkeys? It is critical to remember that rowkeys are stamped on every column in HBase. If the hostname is `a` and the event type is `e1` then the resulting rowkey would be quite small. http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/tracing.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/tracing.adoc b/src/main/asciidoc/_chapters/tracing.adoc index 6bb8065..9b3711e 100644 --- a/src/main/asciidoc/_chapters/tracing.adoc +++ b/src/main/asciidoc/_chapters/tracing.adoc @@ -30,13 +30,13 @@ :icons: font :experimental: -link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:http://github.com/cloudera/htrace[HTrace]. +link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:http://htrace.incubator.apache.org/[HTrace]. Setting up tracing is quite simple, however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement). [[tracing.spanreceivers]] === SpanReceivers -The tracing system works by collecting information in structs called 'Spans'. It is up to you to choose how you want to receive this information by implementing the `SpanReceiver` interface, which defines one method: +The tracing system works by collecting information in structures called 'Spans'. It is up to you to choose how you want to receive this information by implementing the `SpanReceiver` interface, which defines one method: [source] ---- @@ -57,51 +57,38 @@ The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-fi <property> <name>hbase.trace.spanreceiver.classes</name> - <value>org.htrace.impl.LocalFileSpanReceiver</value> + <value>org.apache.htrace.impl.LocalFileSpanReceiver</value> </property> <property> - <name>hbase.local-file-span-receiver.path</name> + <name>hbase.htrace.local-file-span-receiver.path</name> <value>/var/log/hbase/htrace.out</value> </property> ---- -HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. -In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster. +HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster. -_htrace-zipkin_ is published to the maven central repository. -You could get the latest version from there or just build it locally and then copy it out to all nodes, change your config to use zipkin receiver, distribute the new configuration and then (rolling) restart. +_htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:http://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes. -Here is the example of manual setup procedure. - ----- - -$ git clone https://github.com/cloudera/htrace -$ cd htrace/htrace-zipkin -$ mvn compile assembly:single -$ cp target/htrace-zipkin-*-jar-with-dependencies.jar $HBASE_HOME/lib/ - # copy jar to all nodes... ----- - -The `ZipkinSpanReceiver` looks in _hbase-site.xml_ for a `hbase.zipkin.collector-hostname` and `hbase.zipkin.collector-port` property with a value describing the Zipkin collector server to which span information are sent. +`ZipkinSpanReceiver` for properties called `hbase.htrace.zipkin.collector-hostname` and `hbase.htrace.zipkin.collector-port` in _hbase-site.xml_ with values describing the Zipkin collector server to which span information are sent. [source,xml] ---- <property> <name>hbase.trace.spanreceiver.classes</name> - <value>org.htrace.impl.ZipkinSpanReceiver</value> + <value>org.apache.htrace.impl.ZipkinSpanReceiver</value> </property> <property> - <name>hbase.zipkin.collector-hostname</name> + <name>hbase.htrace.zipkin.collector-hostname</name> <value>localhost</value> </property> <property> - <name>hbase.zipkin.collector-port</name> + <name>hbase.htrace.zipkin.collector-port</name> <value>9410</value> </property> ---- -If you do not want to use the included span receivers, you are encouraged to write your own receiver (take a look at `LocalFileSpanReceiver` for an example). If you think others would benefit from your receiver, file a JIRA or send a pull request to link:http://github.com/cloudera/htrace[HTrace]. +If you do not want to use the included span receivers, you are encouraged to write your own receiver (take a look at `LocalFileSpanReceiver` for an example). If you think others would benefit from your receiver, file a JIRA with the HTrace project. [[tracing.client.modifications]] == Client Modifications @@ -160,8 +147,7 @@ See the HTrace _README_ for more information on Samplers. [[tracing.client.shell]] == Tracing from HBase Shell -You can use +trace+ command for tracing requests from HBase Shell. +trace 'start'+ command turns on tracing and +trace - 'stop'+ command turns off tracing. +You can use `trace` command for tracing requests from HBase Shell. `trace 'start'` command turns on tracing and `trace 'stop'` command turns off tracing. [source] ---- @@ -171,9 +157,8 @@ hbase(main):002:0> put 'test', 'row1', 'f:', 'val1' # traced commands hbase(main):003:0> trace 'stop' ---- -+trace 'start'+ and +trace 'stop'+ always returns boolean value representing if or not there is ongoing tracing. -As a result, +trace - 'stop'+ returns false on suceess. +trace 'status'+ just returns if or not tracing is turned on. +`trace 'start'` and `trace 'stop'` always returns boolean value representing if or not there is ongoing tracing. +As a result, `trace 'stop'` returns false on success. `trace 'status'` just returns if or not tracing is turned on. [source] ---- http://git-wip-us.apache.org/repos/asf/hbase/blob/15886909/src/main/asciidoc/_chapters/zookeeper.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/zookeeper.adoc b/src/main/asciidoc/_chapters/zookeeper.adoc index f6134b7..3266964 100644 --- a/src/main/asciidoc/_chapters/zookeeper.adoc +++ b/src/main/asciidoc/_chapters/zookeeper.adoc @@ -35,7 +35,7 @@ You can also manage the ZooKeeper ensemble independent of HBase and just point H To toggle HBase management of ZooKeeper, use the `HBASE_MANAGES_ZK` variable in _conf/hbase-env.sh_. This variable, which defaults to `true`, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase start/stop. -When HBase manages the ZooKeeper ensemble, you can specify ZooKeeper configuration using its native _zoo.cfg_ file, or, the easier option is to just specify ZooKeeper options directly in _conf/hbase-site.xml_. +When HBase manages the ZooKeeper ensemble, you can specify ZooKeeper configuration directly in _conf/hbase-site.xml_. A ZooKeeper configuration option can be set as a property in the HBase _hbase-site.xml_ XML configuration file by prefacing the ZooKeeper option name with `hbase.zookeeper.property`. For example, the `clientPort` setting in ZooKeeper can be changed by setting the `hbase.zookeeper.property.clientPort` property. For all default values used by HBase, including ZooKeeper configuration, see <<hbase_default_configurations,hbase default configurations>>. @@ -124,8 +124,7 @@ To point HBase at an existing ZooKeeper cluster, one that is not managed by HBas export HBASE_MANAGES_ZK=false ---- -Next set ensemble locations and client port, if non-standard, in _hbase-site.xml_, or add a suitably configured _zoo.cfg_ to HBase's _CLASSPATH_. -HBase will prefer the configuration found in _zoo.cfg_ over any settings in _hbase-site.xml_. +Next set ensemble locations and client port, if non-standard, in _hbase-site.xml_. When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part of the regular start/stop scripts. If you would like to run ZooKeeper yourself, independent of HBase start/stop, you would do the following @@ -312,21 +311,23 @@ Modify your _hbase-site.xml_ on each node that will run a master or regionserver <name>hbase.cluster.distributed</name> <value>true</value> </property> + <property> + <name>hbase.zookeeper.property.authProvider.1</name> + <value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value> + </property> + <property> + <name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name> + <value>true</value> + </property> + <property> + <name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name> + <value>true</value> + </property> </configuration> ---- where `$ZK_NODES` is the comma-separated list of hostnames of the Zookeeper Quorum hosts. -Add a _zoo.cfg_ for each Zookeeper Quorum host containing: - -[source,java] ----- - -authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider -kerberos.removeHostFromPrincipal=true -kerberos.removeRealmFromPrincipal=true ----- - Also on each of these hosts, create a JAAS configuration file containing: [source,java]