http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/performance.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/performance.adoc b/src/main/asciidoc/_chapters/performance.adoc index 114754f..c917646 100644 --- a/src/main/asciidoc/_chapters/performance.adoc +++ b/src/main/asciidoc/_chapters/performance.adoc @@ -320,7 +320,7 @@ See also <<perf.compression.however>> for compression caveats. [[schema.regionsize]] === Table RegionSize -The regionsize can be set on a per-table basis via `setFileSize` on link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor] in the event where certain tables require different regionsizes than the configured default regionsize. +The regionsize can be set on a per-table basis via `setFileSize` on link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor] in the event where certain tables require different regionsizes than the configured default regionsize. See <<ops.capacity.regions>> for more information. @@ -372,7 +372,7 @@ Bloom filters are enabled on a Column Family. You can do this by using the setBloomFilterType method of HColumnDescriptor or using the HBase API. Valid values are `NONE`, `ROW` (default), or `ROWCOL`. See <<bloom.filters.when>> for more information on `ROW` versus `ROWCOL`. -See also the API documentation for link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. +See also the API documentation for link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. The following example creates a table and enables a ROWCOL Bloom filter on the `colfam1` column family. @@ -431,7 +431,7 @@ The blocksize can be configured for each ColumnFamily in a table, and defaults t Larger cell values require larger blocksizes. There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved). -See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] and <<store>>for more information. +See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] and <<store>>for more information. [[cf.in.memory]] === In-Memory ColumnFamilies @@ -440,7 +440,7 @@ ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. In-memory blocks have the highest priority in the <<block.cache>>, but it is not a guarantee that the entire table will be in memory. -See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information. +See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information. [[perf.compression]] === Compression @@ -549,19 +549,9 @@ If deferred log flush is used, WAL edits are kept in memory until the flush peri The benefit is aggregated and asynchronous `WAL`- writes, but the potential downside is that if the RegionServer goes down the yet-to-be-flushed edits are lost. This is safer, however, than not using WAL at all with Puts. -Deferred log flush can be configured on tables via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor]. +Deferred log flush can be configured on tables via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor]. The default value of `hbase.regionserver.optionallogflushinterval` is 1000ms. -[[perf.hbase.client.autoflush]] -=== HBase Client: AutoFlush - -When performing a lot of Puts, make sure that setAutoFlush is set to false on your link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instance. -Otherwise, the Puts will be sent one at a time to the RegionServer. -Puts added via `table.add(Put)` and `table.add( <List> Put)` wind up in the same write buffer. -If `autoFlush = false`, these messages are not sent until the write-buffer is filled. -To explicitly flush the messages, call `flushCommits`. -Calling `close` on the `Table` instance will invoke `flushCommits`. - [[perf.hbase.client.putwal]] === HBase Client: Turn off WAL on Puts @@ -584,7 +574,7 @@ There is a utility `HTableUtil` currently on MASTER that does this, but you can [[perf.hbase.write.mr.reducer]] === MapReduce: Skip The Reducer -When writing a lot of data to an HBase table from a MR job (e.g., with link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat]), and specifically where Puts are being emitted from the Mapper, skip the Reducer step. +When writing a lot of data to an HBase table from a MR job (e.g., with link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat]), and specifically where Puts are being emitted from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase. @@ -597,7 +587,7 @@ If all your data is being written to one region at a time, then re-read the sect Also, if you are pre-splitting regions and all your data is _still_ winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a variety of reasons that regions may appear "well split" but won't work with your data. -As the HBase client communicates directly with the RegionServers, this can be obtained via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte%5B%5D)[Table.getRegionLocation]. +As the HBase client communicates directly with the RegionServers, this can be obtained via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/RegionLocator.html#getRegionLocation-byte:A-[RegionLocator.getRegionLocation]. See <<precreate.regions>>, as well as <<perf.configurations>> @@ -610,7 +600,7 @@ For example, here is a good general thread on what to look at addressing read-ti [[perf.hbase.client.caching]] === Scan Caching -If HBase is used as an input source for a MapReduce job, for example, make sure that the input link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instance to the MapReduce job has `setCaching` set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed. +If HBase is used as an input source for a MapReduce job, for example, make sure that the input link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instance to the MapReduce job has `setCaching` set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed. Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed. There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn't always better. @@ -659,7 +649,7 @@ For MapReduce jobs that use HBase tables as a source, if there a pattern where t === Close ResultScanners This isn't so much about improving performance but rather _avoiding_ performance problems. -If you forget to close link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html[ResultScanners] you can cause problems on the RegionServers. +If you forget to close link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html[ResultScanners] you can cause problems on the RegionServers. Always have ResultScanner processing enclosed in try/catch blocks. [source,java] @@ -679,7 +669,7 @@ table.close(); [[perf.hbase.client.blockcache]] === Block Cache -link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be set to use the block cache in the RegionServer via the `setCacheBlocks` method. +link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be set to use the block cache in the RegionServer via the `setCacheBlocks` method. For input Scans to MapReduce jobs, this should be `false`. For frequently accessed rows, it is advisable to use the block cache. @@ -689,8 +679,8 @@ See <<offheap.blockcache>> [[perf.hbase.client.rowkeyonly]] === Optimal Loading of Row Keys -When performing a table link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[scan] where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a `MUST_PASS_ALL` operator to the scanner using `setFilter`. -The filter list should include both a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter] and a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html[KeyOnlyFilter]. +When performing a table link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[scan] where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a `MUST_PASS_ALL` operator to the scanner using `setFilter`. +The filter list should include both a link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter] and a link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html[KeyOnlyFilter]. Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk and minimal network traffic to the client for a single row. [[perf.hbase.read.dist]] @@ -709,7 +699,7 @@ Enabling Bloom Filters can save your having to go to disk and can help improve r link:http://en.wikipedia.org/wiki/Bloom_filter[Bloom filters] were developed over in link:https://issues.apache.org/jira/browse/HBASE-1200[HBase-1200 Add bloomfilters]. For description of the development process -- why static blooms rather than dynamic -- and for an overview of the unique properties that pertain to blooms in HBase, as well as possible future directions, see the _Development Process_ section of the document link:https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf[BloomFilters in HBase] attached to link:https://issues.apache.org/jira/browse/HBASE-1200[HBASE-1200]. The bloom filters described here are actually version two of blooms in HBase. -In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the link:http://www.one-lab.org/[European Commission One-Lab Project 034819]. +In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the link:http://www.onelab.org[European Commission One-Lab Project 034819]. The core of the HBase bloom work was later pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. Version 1 of HBase blooms never worked that well. Version 2 is a rewrite from scratch though again it starts with the one-lab work. @@ -826,7 +816,7 @@ In this case, special care must be taken to regularly perform major compactions As is documented in <<datamodel>>, marking rows as deleted creates additional StoreFiles which then need to be processed on reads. Tombstones only get cleaned up with major compactions. -See also <<compaction>> and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact%28java.lang.String%29[Admin.majorCompact]. +See also <<compaction>> and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact]. [[perf.deleting.rpc]] === Delete RPC Behavior @@ -835,8 +825,7 @@ Be aware that `Table.delete(Delete)` doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation. For a large number of deletes, consider `Table.delete(List)`. -See -+++<a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete%28org.apache.hadoop.hbase.client.Delete%29">hbase.client.Delete</a>+++. +See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[hbase.client.Delete] [[perf.hdfs]] == HDFS
http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/preface.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/preface.adoc b/src/main/asciidoc/_chapters/preface.adoc index ed2ca7a..280f2d8 100644 --- a/src/main/asciidoc/_chapters/preface.adoc +++ b/src/main/asciidoc/_chapters/preface.adoc @@ -27,11 +27,11 @@ :icons: font :experimental: -This is the official reference guide for the link:http://hbase.apache.org/[HBase] version it ships with. +This is the official reference guide for the link:https://hbase.apache.org/[HBase] version it ships with. Herein you will find either the definitive documentation on an HBase topic as of its standing when the referenced HBase version shipped, or it will point to the location -in link:http://hbase.apache.org/apidocs/index.html[Javadoc] or +in link:https://hbase.apache.org/apidocs/index.html[Javadoc] or link:https://issues.apache.org/jira/browse/HBASE[JIRA] where the pertinent information can be found. .About This Guide http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/protobuf.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/protobuf.adoc b/src/main/asciidoc/_chapters/protobuf.adoc index 8c73dd0..ad7e378 100644 --- a/src/main/asciidoc/_chapters/protobuf.adoc +++ b/src/main/asciidoc/_chapters/protobuf.adoc @@ -29,7 +29,7 @@ == Protobuf -HBase uses Google's link:http://protobuf.protobufs[protobufs] wherever +HBase uses Google's link:https://developers.google.com/protocol-buffers/[protobufs] wherever it persists metadata -- in the tail of hfiles or Cells written by HBase into the system hbase:meta table or when HBase writes znodes to zookeeper, etc. -- and when it passes objects over the wire making http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/rpc.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/rpc.adoc b/src/main/asciidoc/_chapters/rpc.adoc index 1d363eb..fbfba6c 100644 --- a/src/main/asciidoc/_chapters/rpc.adoc +++ b/src/main/asciidoc/_chapters/rpc.adoc @@ -28,7 +28,7 @@ :icons: font :experimental: -In 0.95, all client/server communication is done with link:https://developers.google.com/protocol-buffers/[protobuf'ed] Messages rather than with link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html[Hadoop +In 0.95, all client/server communication is done with link:https://developers.google.com/protocol-buffers/[protobuf'ed] Messages rather than with link:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html[Hadoop Writables]. Our RPC wire format therefore changes. This document describes the client/server request/response protocol and our new RPC wire-format. http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/schema_design.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc index cef05f2..4cd7656 100644 --- a/src/main/asciidoc/_chapters/schema_design.adoc +++ b/src/main/asciidoc/_chapters/schema_design.adoc @@ -47,7 +47,7 @@ See also Robert Yokota's link:https://blogs.apache.org/hbase/entry/hbase-applica [[schema.creation]] == Schema Creation -HBase schemas can be created or updated using the <<shell>> or by using link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin] in the Java API. +HBase schemas can be created or updated using the <<shell>> or by using link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin] in the Java API. Tables must be disabled when making ColumnFamily modifications, for example: @@ -223,7 +223,7 @@ You could also optimize things so that certain pairs of keys were always in the A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first. This effectively randomizes row keys, but sacrifices row ordering properties. -See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and link:http://phoenix.apache.org/salted.html[article on Salted Tables] from the Phoenix project, and the discussion in the comments of link:https://issues.apache.org/jira/browse/HBASE-11682[HBASE-11682] for more information about avoiding hotspotting. +See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and link:https://phoenix.apache.org/salted.html[article on Salted Tables] from the Phoenix project, and the discussion in the comments of link:https://issues.apache.org/jira/browse/HBASE-11682[HBASE-11682] for more information about avoiding hotspotting. [[timeseries]] === Monotonically Increasing Row Keys/Timeseries Data @@ -338,7 +338,7 @@ This is the main trade-off. ==== link:https://issues.apache.org/jira/browse/HBASE-4811[HBASE-4811] implements an API to scan a table or a range within a table in reverse, reducing the need to optimize your schema for forward or reverse scanning. This feature is available in HBase 0.98 and later. -See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed%28boolean for more information. +See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed-boolean-[Scan.setReversed()] for more information. ==== A common problem in database processing is quickly finding the most recent version of a value. @@ -433,7 +433,7 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio [[schema.versions.max]] === Maximum Number of Versions -The maximum number of row versions to store is configured per column family via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. +The maximum number of row versions to store is configured per column family via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. The default for max versions is 1. This is an important parameter because as described in <<datamodel>> section HBase does _not_ overwrite row values, but rather stores different values per row by time (and qualifier). Excess versions are removed during major compactions. The number of max versions may need to be increased or decreased depending on application needs. @@ -443,14 +443,14 @@ It is not recommended setting the number of max versions to an exceedingly high [[schema.minversions]] === Minimum Number of Versions -Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. +Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor]. The default for min versions is 0, which means the feature is disabled. The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the number of row versions parameter to allow configurations such as "keep the last T minutes worth of data, at most N versions, _but keep at least M versions around_" (where M is the value for minimum number of row versions, M<N). This parameter should only be set when time-to-live is enabled for a column family and must be less than the number of row versions. [[supported.datatypes]] == Supported Datatypes -HBase supports a "bytes-in/bytes-out" interface via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value. +HBase supports a "bytes-in/bytes-out" interface via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes. There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailing list for conversations on this topic. @@ -459,7 +459,7 @@ Take that into consideration when making your design, as well as block size for === Counters -One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#increment%28org.apache.hadoop.hbase.client.Increment%29[Increment] in `Table`. +One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#increment%28org.apache.hadoop.hbase.client.Increment%29[Increment] in `Table`. Synchronization on counters are done on the RegionServer, not in the client. @@ -479,7 +479,7 @@ Store files which contains only expired rows are deleted on minor compaction. Setting `hbase.store.delete.expired.storefile` to `false` disables this feature. Setting minimum number of versions to other than 0 also disables this. -See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information. +See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information. Recent versions of HBase also support setting time to live on a per cell basis. See link:https://issues.apache.org/jira/browse/HBASE-10560[HBASE-10560] for more information. @@ -494,7 +494,7 @@ There are two notable differences between cell TTL handling and ColumnFamily TTL == Keeping Deleted Cells By default, delete markers extend back to the beginning of time. -Therefore, link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed. +Therefore, link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed. ColumnFamilies can optionally keep deleted cells. In this case, deleted cells can still be retrieved, as long as these operations specify a time range that ends before the timestamp of any delete that would affect the cells. @@ -684,7 +684,7 @@ in the table (e.g. make sure values are in the range 1-10). Constraints could also be used to enforce referential integrity, but this is strongly discouraged as it will dramatically decrease the write throughput of the tables where integrity checking is enabled. Extensive documentation on using Constraints can be found at -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/constraint/Constraint.html[Constraint] +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/constraint/Constraint.html[Constraint] since version 0.94. [[schema.casestudies]] @@ -760,7 +760,7 @@ Neither approach is wrong, it just depends on what is most appropriate for the s ==== link:https://issues.apache.org/jira/browse/HBASE-4811[HBASE-4811] implements an API to scan a table or a range within a table in reverse, reducing the need to optimize your schema for forward or reverse scanning. This feature is available in HBase 0.98 and later. -See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed%28boolean for more information. +See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed-boolean-[Scan.setReversed()] for more information. ==== [[schema.casestudies.log_timeseries.varkeys]] @@ -789,8 +789,7 @@ The rowkey of LOG_TYPES would be: * `[bytes]` variable length bytes for raw hostname or event-type. A column for this rowkey could be a long with an assigned number, which could be obtained -by using an -+++<a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29">HBase counter</a>+++. +by using an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue-byte:A-byte:A-byte:A-long-[HBase counter] So the resulting composite rowkey would be: @@ -806,7 +805,7 @@ In either the Hash or Numeric substitution approach, the raw values for hostname This effectively is the OpenTSDB approach. What OpenTSDB does is re-write data and pack rows into columns for certain time-periods. For a detailed explanation, see: http://opentsdb.net/schema.html, and -+++<a href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</a>+++ +link:https://www.slideshare.net/cloudera/4-opentsdb-hbasecon[Lessons Learned from OpenTSDB] from HBaseCon2012. But this is how the general concept works: data is ingested, for example, in this manner... @@ -1096,7 +1095,7 @@ The tl;dr version is that you should probably go with one row per user+value, an Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done. What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that. -Doing it this way is generally recommended (see here http://hbase.apache.org/book.html#schema.smackdown). +Doing it this way is generally recommended (see here https://hbase.apache.org/book.html#schema.smackdown). Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row. I'm guessing you jumped to the "paginated" version because you're assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse. @@ -1113,7 +1112,7 @@ If you don't have time to build it both ways and compare, my advice would be to [[schema.ops]] == Operational and Performance Configuration Options -==== Tune HBase Server RPC Handling +=== Tune HBase Server RPC Handling * Set `hbase.regionserver.handler.count` (in `hbase-site.xml`) to cores x spindles for concurrency. * Optionally, split the call queues into separate read and write queues for differentiated service. The parameter `hbase.ipc.server.callqueue.handler.factor` specifies the number of call queues: @@ -1129,7 +1128,7 @@ If you don't have time to build it both ways and compare, my advice would be to - `< 0.5` for more short-read - `> 0.5` for more long-read -==== Disable Nagle for RPC +=== Disable Nagle for RPC Disable Nagleâs algorithm. Delayed ACKs can add up to ~200ms to RPC round trip time. Set the following parameters: @@ -1140,7 +1139,7 @@ Disable Nagleâs algorithm. Delayed ACKs can add up to ~200ms to RPC round trip - `hbase.ipc.client.tcpnodelay = true` - `hbase.ipc.server.tcpnodelay = true` -==== Limit Server Failure Impact +=== Limit Server Failure Impact Detect regionserver failure as fast as reasonable. Set the following parameters: @@ -1149,7 +1148,7 @@ Detect regionserver failure as fast as reasonable. Set the following parameters: - `dfs.namenode.avoid.read.stale.datanode = true` - `dfs.namenode.avoid.write.stale.datanode = true` -==== Optimize on the Server Side for Low Latency +=== Optimize on the Server Side for Low Latency * Skip the network for local blocks. In `hbase-site.xml`, set the following parameters: - `dfs.client.read.shortcircuit = true` @@ -1187,7 +1186,7 @@ Detect regionserver failure as fast as reasonable. Set the following parameters: == Special Cases -==== For applications where failing quickly is better than waiting +=== For applications where failing quickly is better than waiting * In `hbase-site.xml` on the client side, set the following parameters: - Set `hbase.client.pause = 1000` @@ -1196,7 +1195,7 @@ Detect regionserver failure as fast as reasonable. Set the following parameters: - Set the RecoverableZookeeper retry count: `zookeeper.recovery.retry = 1` (no retry) * In `hbase-site.xml` on the server side, set the Zookeeper session timeout for detecting server failures: `zookeeper.session.timeout` <= 30 seconds (20-30 is good). -==== For applications that can tolerate slightly out of date information +=== For applications that can tolerate slightly out of date information **HBase timeline consistency (HBASE-10070) ** With read replicas enabled, read-only copies of regions (replicas) are distributed over the cluster. One RegionServer services the default or primary replica, which is the only replica that can service writes. Other RegionServers serve the secondary replicas, follow the primary RegionServer, and only see committed updates. The secondary replicas are read-only, but can serve reads immediately while the primary is failing over, cutting read availability blips from seconds to milliseconds. Phoenix supports timeline consistency as of 4.4.0 http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/security.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/security.adoc b/src/main/asciidoc/_chapters/security.adoc index ccb5adb..cca9364 100644 --- a/src/main/asciidoc/_chapters/security.adoc +++ b/src/main/asciidoc/_chapters/security.adoc @@ -354,7 +354,7 @@ grant 'rest_server', 'RWCA' For more information about ACLs, please see the <<hbase.accesscontrol.configuration>> section -HBase REST gateway supports link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication] for client access to the gateway. +HBase REST gateway supports link:https://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication] for client access to the gateway. To enable REST gateway Kerberos authentication for client access, add the following to the `hbase-site.xml` file for every REST gateway. [source,xml] @@ -390,7 +390,7 @@ Substitute the keytab for HTTP for _$KEYTAB_. HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos. You can also implement a custom authentication by implementing Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value. -For more information, refer to link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication]. +For more information, refer to link:https://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication]. [[security.rest.gateway]] === REST Gateway Impersonation Configuration @@ -989,7 +989,7 @@ hbase> help "scan" ---- + -This example grants read access to the 'testuser' user and read/write access to the 'developers' group, on cells in the 'pii' column which match the filter. +If you need to enable cell acl,the hfile.format.version option in hbase-site.xml should be greater than or equal to 3,and the hbase.security.access.early_out option should be set to false.This example grants read access to the 'testuser' user and read/write access to the 'developers' group, on cells in the 'pii' column which match the filter. + ---- hbase> grant 'user', \ @@ -1390,11 +1390,11 @@ When you issue a Scan or Get, HBase uses your default set of authorizations to filter out cells that you do not have access to. A superuser can set the default set of authorizations for a given user by using the `set_auths` HBase Shell command or the -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths(org.apache.hadoop.hbase.client.Connection,%20java.lang.String\[\],%20java.lang.String)[VisibilityClient.setAuths()] method. +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths-org.apache.hadoop.hbase.client.Connection-java.lang.String:A-java.lang.String-[VisibilityClient.setAuths()] method. You can specify a different authorization during the Scan or Get, by passing the AUTHORIZATIONS option in HBase Shell, or the -link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations%28org.apache.hadoop.hbase.security.visibility.Authorizations%29[setAuthorizations()] +link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations-org.apache.hadoop.hbase.security.visibility.Authorizations-[Scan.setAuthorizations()] method if you use the API. This authorization will be combined with your default set as an additional filter. It will further filter your results, rather than giving you additional authorization. @@ -1644,7 +1644,7 @@ Rotate the Master Key:: Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the MapReduce job to HBase. Secure bulk loading is implemented by a coprocessor, named -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint], +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint], which uses a staging directory configured by the configuration property `hbase.bulkload.staging.dir`, which defaults to _/tmp/hbase-staging/_. http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/spark.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/spark.adoc b/src/main/asciidoc/_chapters/spark.adoc index 774d137..416457b 100644 --- a/src/main/asciidoc/_chapters/spark.adoc +++ b/src/main/asciidoc/_chapters/spark.adoc @@ -27,7 +27,7 @@ :icons: font :experimental: -link:http://spark.apache.org/[Apache Spark] is a software framework that is used +link:https://spark.apache.org/[Apache Spark] is a software framework that is used to process data in memory in a distributed manner, and is replacing MapReduce in many use cases. @@ -151,7 +151,7 @@ access to HBase For examples of all these functionalities, see the HBase-Spark Module. == Spark Streaming -http://spark.apache.org/streaming/[Spark Streaming] is a micro batching stream +https://spark.apache.org/streaming/[Spark Streaming] is a micro batching stream processing framework built on top of Spark. HBase and Spark Streaming make great companions in that HBase can help serve the following benefits alongside Spark Streaming. http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/sql.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/sql.adoc b/src/main/asciidoc/_chapters/sql.adoc index b1ad063..f1c445d 100644 --- a/src/main/asciidoc/_chapters/sql.adoc +++ b/src/main/asciidoc/_chapters/sql.adoc @@ -33,10 +33,10 @@ The following projects offer some support for SQL over HBase. [[phoenix]] === Apache Phoenix -link:http://phoenix.apache.org[Apache Phoenix] +link:https://phoenix.apache.org[Apache Phoenix] === Trafodion -link:http://trafodion.incubator.apache.org/[Trafodion: Transactional SQL-on-HBase] +link:https://trafodion.incubator.apache.org/[Trafodion: Transactional SQL-on-HBase] :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/thrift_filter_language.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/thrift_filter_language.adoc b/src/main/asciidoc/_chapters/thrift_filter_language.adoc index da36cea..1c1279d 100644 --- a/src/main/asciidoc/_chapters/thrift_filter_language.adoc +++ b/src/main/asciidoc/_chapters/thrift_filter_language.adoc @@ -28,7 +28,7 @@ :experimental: -Apache link:http://thrift.apache.org/[Thrift] is a cross-platform, cross-language development framework. +Apache link:https://thrift.apache.org/[Thrift] is a cross-platform, cross-language development framework. HBase includes a Thrift API and filter language. The Thrift API relies on client and server processes. http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/tracing.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/tracing.adoc b/src/main/asciidoc/_chapters/tracing.adoc index 0cddd8a..8bd1962 100644 --- a/src/main/asciidoc/_chapters/tracing.adoc +++ b/src/main/asciidoc/_chapters/tracing.adoc @@ -30,7 +30,7 @@ :icons: font :experimental: -link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:http://htrace.incubator.apache.org/[HTrace]. +link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:https://htrace.incubator.apache.org/[HTrace]. Setting up tracing is quite simple, however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement). [[tracing.spanreceivers]] @@ -57,7 +57,7 @@ The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-fi <property> <name>hbase.trace.spanreceiver.classes</name> - <value>org.apache.htrace.impl.LocalFileSpanReceiver</value> + <value>org.apache.htrace.core.LocalFileSpanReceiver</value> </property> <property> <name>hbase.htrace.local-file-span-receiver.path</name> @@ -67,7 +67,7 @@ The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-fi HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster. -_htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:http://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes. +_htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:https://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes. `ZipkinSpanReceiver` for properties called `hbase.htrace.zipkin.collector-hostname` and `hbase.htrace.zipkin.collector-port` in _hbase-site.xml_ with values describing the Zipkin collector server to which span information are sent. @@ -76,7 +76,7 @@ _htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7 <property> <name>hbase.trace.spanreceiver.classes</name> - <value>org.apache.htrace.impl.ZipkinSpanReceiver</value> + <value>org.apache.htrace.core.ZipkinSpanReceiver</value> </property> <property> <name>hbase.htrace.zipkin.collector-hostname</name> http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/troubleshooting.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc b/src/main/asciidoc/_chapters/troubleshooting.adoc index 1cf93d6..ec0a34d 100644 --- a/src/main/asciidoc/_chapters/troubleshooting.adoc +++ b/src/main/asciidoc/_chapters/troubleshooting.adoc @@ -225,7 +225,7 @@ Search here first when you have an issue as its more than likely someone has alr [[trouble.resources.lists]] === Mailing Lists -Ask a question on the link:http://hbase.apache.org/mail-lists.html[Apache HBase mailing lists]. +Ask a question on the link:https://hbase.apache.org/mail-lists.html[Apache HBase mailing lists]. The 'dev' mailing list is aimed at the community of developers actually building Apache HBase and for features currently under development, and 'user' is generally used for questions on released versions of Apache HBase. Before going to the mailing list, make sure your question has not already been answered by searching the mailing list archives first. Use <<trouble.resources.searchhadoop>>. @@ -596,7 +596,7 @@ See also Jesse Andersen's link:http://blog.cloudera.com/blog/2014/04/how-to-use- In some situations clients that fetch data from a RegionServer get a LeaseException instead of the usual <<trouble.client.scantimeout>>. Usually the source of the exception is `org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)` (line number may vary). It tends to happen in the context of a slow/freezing `RegionServer#next` call. It can be prevented by having `hbase.rpc.timeout` > `hbase.regionserver.lease.period`. -Harsh J investigated the issue as part of the mailing list thread link:http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E[HBase, mail # user - Lease does not exist exceptions] +Harsh J investigated the issue as part of the mailing list thread link:https://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E[HBase, mail # user - Lease does not exist exceptions] [[trouble.client.scarylogs]] === Shell or client application throws lots of scary exceptions during normal operation @@ -706,7 +706,10 @@ Because of a change in the format in which MIT Kerberos writes its credentials c If you have this problematic combination of components in your environment, to work around this problem, first log in with `kinit` and then immediately refresh the credential cache with `kinit -R`. The refresh will rewrite the credential cache without the problematic formatting. -Finally, depending on your Kerberos configuration, you may need to install the link:http://docs.oracle.com/javase/1.4.2/docs/guide/security/jce/JCERefGuide.html[Java Cryptography Extension], or JCE. +Prior to JDK 1.4, the JCE was an unbundled product, and as such, the JCA and JCE were regularly referred to as separate, distinct components. +As JCE is now bundled in the JDK 7.0, the distinction is becoming less apparent. Since the JCE uses the same architecture as the JCA, the JCE should be more properly thought of as a part of the JCA. + +You may need to install the link:https://docs.oracle.com/javase/1.5.0/docs/guide/security/jce/JCERefGuide.html[Java Cryptography Extension], or JCE because of JDK 1.5 or earlier version. Insure the JCE jars are on the classpath on both server and client systems. You may also need to download the link:http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html[unlimited strength JCE policy files]. @@ -758,7 +761,7 @@ For example (substitute VERSION with your HBase version): HADOOP_CLASSPATH=`hbase classpath` hadoop jar $HBASE_HOME/hbase-server-VERSION.jar rowcounter usertable ---- -See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpathfor more information on HBase MapReduce jobs and classpaths. +See <<hbase.mapreduce.classpath,HBase, MapReduce, and the CLASSPATH>> for more information on HBase MapReduce jobs and classpaths. [[trouble.hbasezerocopybytestring]] === Launching a job, you get java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString or class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString @@ -799,7 +802,7 @@ hadoop fs -du /hbase/myTable ---- ...returns a list of the regions under the HBase table 'myTable' and their disk utilization. -For more information on HDFS shell commands, see the link:http://hadoop.apache.org/common/docs/current/file_system_shell.html[HDFS FileSystem Shell documentation]. +For more information on HDFS shell commands, see the link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html[HDFS FileSystem Shell documentation]. [[trouble.namenode.hbase.objects]] === Browsing HDFS for HBase Objects @@ -830,7 +833,7 @@ The HDFS directory structure of HBase WAL is.. /<WAL> (WAL files for the RegionServer) ---- -See the link:http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html[HDFS User Guide] for other non-shell diagnostic utilities like `fsck`. +See the link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[HDFS User Guide] for other non-shell diagnostic utilities like `fsck`. [[trouble.namenode.0size.hlogs]] ==== Zero size WALs with data in them @@ -1171,7 +1174,7 @@ If you have a DNS server, you can set `hbase.zookeeper.dns.interface` and `hbase ZooKeeper is the cluster's "canary in the mineshaft". It'll be the first to notice issues if any so making sure its happy is the short-cut to a humming cluster. -See the link:http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting[ZooKeeper Operating Environment Troubleshooting] page. +See the link:https://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting[ZooKeeper Operating Environment Troubleshooting] page. It has suggestions and tools for checking disk and networking performance; i.e. the operating environment your ZooKeeper and HBase are running in. @@ -1310,7 +1313,7 @@ These changes were backported to HBase 0.98.x and apply to all newer versions. == HBase and HDFS General configuration guidance for Apache HDFS is out of the scope of this guide. -Refer to the documentation available at http://hadoop.apache.org/ for extensive information about configuring HDFS. +Refer to the documentation available at https://hadoop.apache.org/ for extensive information about configuring HDFS. This section deals with HDFS in terms of HBase. In most cases, HBase stores its data in Apache HDFS. http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/unit_testing.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/unit_testing.adoc b/src/main/asciidoc/_chapters/unit_testing.adoc index 6131d5a..e503f81 100644 --- a/src/main/asciidoc/_chapters/unit_testing.adoc +++ b/src/main/asciidoc/_chapters/unit_testing.adoc @@ -33,7 +33,7 @@ For information on unit tests for HBase itself, see <<hbase.tests,hbase.tests>>. == JUnit -HBase uses link:http://junit.org[JUnit] 4 for unit tests +HBase uses link:http://junit.org[JUnit] for unit tests This example will add unit tests to the following example class: @@ -117,8 +117,8 @@ First, add a dependency for Mockito to your Maven POM file. <dependency> <groupId>org.mockito</groupId> - <artifactId>mockito-all</artifactId> - <version>1.9.5</version> + <artifactId>mockito-core</artifactId> + <version>2.1.0</version> <scope>test</scope> </dependency> ---- @@ -171,7 +171,7 @@ Similarly, you can now expand into other operations such as Get, Scan, or Delete == MRUnit -link:http://mrunit.apache.org/[Apache MRUnit] is a library that allows you to unit-test MapReduce jobs. +link:https://mrunit.apache.org/[Apache MRUnit] is a library that allows you to unit-test MapReduce jobs. You can use it to test HBase jobs in the same way as other MapReduce jobs. Given a MapReduce job that writes to an HBase table called `MyTest`, which has one column family called `CF`, the reducer of such a job could look like the following: http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/upgrading.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/upgrading.adoc b/src/main/asciidoc/_chapters/upgrading.adoc index d07766b..fd8a86a 100644 --- a/src/main/asciidoc/_chapters/upgrading.adoc +++ b/src/main/asciidoc/_chapters/upgrading.adoc @@ -67,7 +67,7 @@ In addition to the usual API versioning considerations HBase has other compatibi .File format compatibility * Support file formats backward and forward compatible -* Example: File, ZK encoding, directory layout is upgraded automatically as part of an HBase upgrade. User can rollback to the older version and everything will continue to work. +* Example: File, ZK encoding, directory layout is upgraded automatically as part of an HBase upgrade. User can downgrade to the older version and everything will continue to work. .Client API compatibility * Allow changing or removing existing client APIs. @@ -75,7 +75,7 @@ In addition to the usual API versioning considerations HBase has other compatibi * APIs available in a patch version will be available in all later patch versions. However, new APIs may be added which will not be available in earlier patch versions. * New APIs introduced in a patch version will only be added in a source compatible way footnote:[See 'Source Compatibility' https://blogs.oracle.com/darcy/entry/kinds_of_compatibility]: i.e. code that implements public APIs will continue to compile. ** Example: A user using a newly deprecated API does not need to modify application code with HBase API calls until the next major version. -* +* .Client Binary compatibility * Client code written to APIs available in a given patch release can run unchanged (no recompilation needed) against the new jars of later patch versions. @@ -111,7 +111,7 @@ for warning about incompatible changes). All effort will be made to provide a de | | Major | Minor | Patch |Client-Server wire Compatibility| N |Y |Y |Server-Server Compatibility |N |Y |Y -|File Format Compatibility | N footnote:[comp_matrix_offline_upgrade_note,Running an offline upgrade tool without rollback might be needed. We will typically only support migrating data from major version X to major version X+1.] | Y |Y +|File Format Compatibility | N footnote:[comp_matrix_offline_upgrade_note,Running an offline upgrade tool without downgrade might be needed. We will typically only support migrating data from major version X to major version X+1.] | Y |Y |Client API Compatibility | N | Y |Y |Client Binary Compatibility | N | N |Y 4+|Server-Side Limited API Compatibility @@ -125,10 +125,23 @@ for warning about incompatible changes). All effort will be made to provide a de [[hbase.client.api.surface]] ==== HBase API Surface -HBase has a lot of API points, but for the compatibility matrix above, we differentiate between Client API, Limited Private API, and Private API. HBase uses a version of link:https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html[Hadoop's Interface classification]. HBase's Interface classification classes can be found link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/classification/package-summary.html[here]. +HBase has a lot of API points, but for the compatibility matrix above, we differentiate between Client API, Limited Private API, and Private API. HBase uses link:https://yetus.apache.org/documentation/0.5.0/interface-classification/[Apache Yetus Audience Annotations] to guide downstream expectations for stability. -* InterfaceAudience: captures the intended audience, possible values are Public (for end users and external projects), LimitedPrivate (for other Projects, Coprocessors or other plugin points), and Private (for internal use). Notice that, you may find that the classes which are declared as IA.Private are used as parameter or return value for the interfaces which are declared as IA.LimitedPrivate. This is possible. You should treat the IA.Private object as a monolithic object, which means you can use it as a parameter to call other methods, or return it, but you should never try to access its methods or fields. -* InterfaceStability: describes what types of interface changes are permitted. Possible values are Stable, Evolving, Unstable, and Deprecated. Notice that this annotation is only valid for classes which are marked as IA.LimitedPrivate. The stability of IA.Public classes is only related to the upgrade type(major, minor or patch). And for IA.Private classes, there is no guarantee on the stability between releases. Refer to the Compatibility Matrix above for more details. +* InterfaceAudience (link:https://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceAudience.html[javadocs]): captures the intended audience, possible values include: + - Public: safe for end users and external projects + - LimitedPrivate: used for internals we expect to be pluggable, such as coprocessors + - Private: strictly for use within HBase itself +Classes which are defined as `IA.Private` may be used as parameters or return values for interfaces which are declared `IA.LimitedPrivate`. Treat the `IA.Private` object as opaque; do not try to access its methods or fields directly. +* InterfaceStability (link:https://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceStability.html[javadocs]): describes what types of interface changes are permitted. Possible values include: + - Stable: the interface is fixed and is not expected to change + - Evolving: the interface may change in future minor verisons + - Unstable: the interface may change at any time + +Please keep in mind the following interactions between the `InterfaceAudience` and `InterfaceStability` annotations within the HBase project: + +* `IA.Public` classes are inherently stable and adhere to our stability guarantees relating to the type of upgrade (major, minor, or patch). +* `IA.LimitedPrivate` classes should always be annotated with one of the given `InterfaceStability` values. If they are not, you should presume they are `IS.Unstable`. +* `IA.Private` classes should be considered implicitly unstable, with no guarantee of stability between releases. [[hbase.client.api]] HBase Client API:: @@ -146,9 +159,9 @@ HBase Private API:: === Pre 1.0 versions .HBase Pre-1.0 versions are all EOM -NOTE: For new installations, do not deploy 0.94.y, 0.96.y, or 0.98.y. Deploy our stable version. See link:https://issues.apache.org/jira/browse/HBASE-11642[EOL 0.96], link:https://issues.apache.org/jira/browse/HBASE-16215[clean up of EOM releases], and link:http://www.apache.org/dist/hbase/[the header of our downloads]. +NOTE: For new installations, do not deploy 0.94.y, 0.96.y, or 0.98.y. Deploy our stable version. See link:https://issues.apache.org/jira/browse/HBASE-11642[EOL 0.96], link:https://issues.apache.org/jira/browse/HBASE-16215[clean up of EOM releases], and link:https://www.apache.org/dist/hbase/[the header of our downloads]. -Before the semantic versioning scheme pre-1.0, HBase tracked either Hadoop's versions (0.2x) or 0.9x versions. If you are into the arcane, checkout our old wiki page on link:http://wiki.apache.org/hadoop/Hbase/HBaseVersions[HBase Versioning] which tries to connect the HBase version dots. Below sections cover ONLY the releases before 1.0. +Before the semantic versioning scheme pre-1.0, HBase tracked either Hadoop's versions (0.2x) or 0.9x versions. If you are into the arcane, checkout our old wiki page on link:https://web.archive.org/web/20150905071342/https://wiki.apache.org/hadoop/Hbase/HBaseVersions[HBase Versioning] which tries to connect the HBase version dots. Below sections cover ONLY the releases before 1.0. [[hbase.development.series]] .Odd/Even Versioning or "Development" Series Releases @@ -180,6 +193,135 @@ Unless otherwise specified, HBase point versions are binary compatible. You can In the minor version-particular sections below, we call out where the versions are wire/protocol compatible and in this case, it is also possible to do a <<hbase.rolling.upgrade>>. For example, in <<upgrade1.0.rolling.upgrade>>, we state that it is possible to do a rolling upgrade between hbase-0.98.x and hbase-1.0.0. +== Rollback + +Sometimes things don't go as planned when attempting an upgrade. This section explains how to perform a _rollback_ to an earlier HBase release. Note that this should only be needed between Major and some Minor releases. You should always be able to _downgrade_ between HBase Patch releases within the same Minor version. These instructions may require you to take steps before you start the upgrade process, so be sure to read through this section beforehand. + +=== Caveats + +.Rollback vs Downgrade +This section describes how to perform a _rollback_ on an upgrade between HBase minor and major versions. In this document, rollback refers to the process of taking an upgraded cluster and restoring it to the old version _while losing all changes that have occurred since upgrade_. By contrast, a cluster _downgrade_ would restore an upgraded cluster to the old version while maintaining any data written since the upgrade. We currently only offer instructions to rollback HBase clusters. Further, rollback only works when these instructions are followed prior to performing the upgrade. + +When these instructions talk about rollback vs downgrade of prerequisite cluster services (i.e. HDFS), you should treat leaving the service version the same as a degenerate case of downgrade. + +.Replication +Unless you are doing an all-service rollback, the HBase cluster will lose any configured peers for HBase replication. If your cluster is configured for HBase replication, then prior to following these instructions you should document all replication peers. After performing the rollback you should then add each documented peer back to the cluster. For more information on enabling HBase replication, listing peers, and adding a peer see <<hbase.replication.management>>. Note also that data written to the cluster since the upgrade may or may not have already been replicated to any peers. Determining which, if any, peers have seen replication data as well as rolling back the data in those peers is out of the scope of this guide. + +.Data Locality +Unless you are doing an all-service rollback, going through a rollback procedure will likely destroy all locality for Region Servers. You should expect degraded performance until after the cluster has had time to go through compactions to restore data locality. Optionally, you can force a compaction to speed this process up at the cost of generating cluster load. + +.Configurable Locations +The instructions below assume default locations for the HBase data directory and the HBase znode. Both of these locations are configurable and you should verify the value used in your cluster before proceeding. In the event that you have a different value, just replace the default with the one found in your configuration +* HBase data directory is configured via the key 'hbase.rootdir' and has a default value of '/hbase'. +* HBase znode is configured via the key 'zookeeper.znode.parent' and has a default value of '/hbase'. + +=== All service rollback + +If you will be performing a rollback of both the HDFS and ZooKeeper services, then HBase's data will be rolled back in the process. + +.Requirements + +* Ability to rollback HDFS and ZooKeeper + +.Before upgrade +No additional steps are needed pre-upgrade. As an extra precautionary measure, you may wish to use distcp to back up the HBase data off of the cluster to be upgraded. To do so, follow the steps in the 'Before upgrade' section of 'Rollback after HDFS downgrade' but copy to another HDFS instance instead of within the same instance. + +.Performing a rollback + +. Stop HBase +. Perform a rollback for HDFS and ZooKeeper (HBase should remain stopped) +. Change the installed version of HBase to the previous version +. Start HBase +. Verify HBase contentsâuse the HBase shell to list tables and scan some known values. + +=== Rollback after HDFS rollback and ZooKeeper downgrade + +If you will be rolling back HDFS but going through a ZooKeeper downgrade, then HBase will be in an inconsistent state. You must ensure the cluster is not started until you complete this process. + +.Requirements + +* Ability to rollback HDFS +* Ability to downgrade ZooKeeper + +.Before upgrade +No additional steps are needed pre-upgrade. As an extra precautionary measure, you may wish to use distcp to back up the HBase data off of the cluster to be upgraded. To do so, follow the steps in the 'Before upgrade' section of 'Rollback after HDFS downgrade' but copy to another HDFS instance instead of within the same instance. + +.Performing a rollback + +. Stop HBase +. Perform a rollback for HDFS and a downgrade for ZooKeeper (HBase should remain stopped) +. Change the installed version of HBase to the previous version +. Clean out ZooKeeper information related to HBase. WARNING: This step will permanently destroy all replication peers. Please see the section on HBase Replication under Caveats for more information. ++ +.Clean HBase information out of ZooKeeper +[source,bash] +---- +[hpnewton@gateway_node.example.com ~]$ zookeeper-client -server zookeeper1.example.com:2181,zookeeper2.example.com:2181,zookeeper3.example.com:2181 +Welcome to ZooKeeper! +JLine support is disabled +rmr /hbase +quit +Quitting... +---- +. Start HBase +. Verify HBase contentsâuse the HBase shell to list tables and scan some known values. + +=== Rollback after HDFS downgrade + +If you will be performing an HDFS downgrade, then you'll need to follow these instructions regardless of whether ZooKeeper goes through rollback, downgrade, or reinstallation. + +.Requirements + +* Ability to downgrade HDFS +* Pre-upgrade cluster must be able to run MapReduce jobs +* HDFS super user access +* Sufficient space in HDFS for at least two copies of the HBase data directory + +.Before upgrade +Before beginning the upgrade process, you must take a complete backup of HBase's backing data. The following instructions cover backing up the data within the current HDFS instance. Alternatively, you can use the distcp command to copy the data to another HDFS cluster. + +. Stop the HBase cluster +. Copy the HBase data directory to a backup location using the https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html[distcp command] as the HDFS super user (shown below on a security enabled cluster) ++ +.Using distcp to backup the HBase data directory +[source,bash] +---- + +[hpnewton@gateway_node.example.com ~]$ kinit -k -t hdfs.keytab h...@example.com +[hpnewton@gateway_node.example.com ~]$ hadoop distcp /hbase /hbase-pre-upgrade-backup + +---- +. Distcp will launch a mapreduce job to handle copying the files in a distributed fashion. Check the output of the distcp command to ensure this job completed successfully. + +.Performing a rollback + +. Stop HBase +. Perform a downgrade for HDFS and a downgrade/rollback for ZooKeeper (HBase should remain stopped) +. Change the installed version of HBase to the previous version +. Restore the HBase data directory from prior to the upgrade as the HDFS super user (shown below on a security enabled cluster). If you backed up your data on another HDFS cluster instead of locally, you will need to use the distcp command to copy it back to the current HDFS cluster. ++ +.Restore the HBase data directory +[source,bash] +---- +[hpnewton@gateway_node.example.com ~]$ kinit -k -t hdfs.keytab h...@example.com +[hpnewton@gateway_node.example.com ~]$ hdfs dfs -mv /hbase /hbase-upgrade-rollback +[hpnewton@gateway_node.example.com ~]$ hdfs dfs -mv /hbase-pre-upgrade-backup /hbase +---- +. Clean out ZooKeeper information related to HBase. WARNING: This step will permanently destroy all replication peers. Please see the section on HBase Replication under Caveats for more information. ++ +.Clean HBase information out of ZooKeeper +[source,bash] +---- +[hpnewton@gateway_node.example.com ~]$ zookeeper-client -server zookeeper1.example.com:2181,zookeeper2.example.com:2181,zookeeper3.example.com:2181 +Welcome to ZooKeeper! +JLine support is disabled +rmr /hbase +quit +Quitting... +---- +. Start HBase +. Verify HBase contentsâuse the HBase shell to list tables and scan some known values. + == Upgrade Paths [[upgrade1.0]] @@ -213,10 +355,6 @@ You may have made use of this configuration if you are using BucketCache. If NOT .If you have your own customer filters. See the release notes on the issue link:https://issues.apache.org/jira/browse/HBASE-12068[HBASE-12068 [Branch-1\] Avoid need to always do KeyValueUtil#ensureKeyValue for Filter transformCell]; be sure to follow the recommendations therein. -[[dlr]] -.Distributed Log Replay -<<distributed.log.replay>> is off by default in HBase 1.0.0. Enabling it can make a big difference improving HBase MTTR. Enable this feature if you are doing a clean stop/start when you are upgrading. You cannot rolling upgrade to this feature (caveat if you are running on a version of HBase in excess of HBase 0.98.4 -- see link:https://issues.apache.org/jira/browse/HBASE-12577[HBASE-12577 Disable distributed log replay by default] for more). - .Mismatch Of `hbase.client.scanner.max.result.size` Between Client and Server If either the client or server version is lower than 0.98.11/1.0.0 and the server has a smaller value for `hbase.client.scanner.max.result.size` than the client, scan @@ -241,9 +379,9 @@ There are no known issues running a <<hbase.rolling.upgrade,rolling upgrade>> fr In hbase-1.x, the default Scan caching 'number of rows' changed. Where in 0.98.x, it defaulted to 100, in later HBase versions, the default became Integer.MAX_VALUE. Not setting a cache size can make -for Scans that run for a long time server-side, especially if +for Scans that run for a long time server-side, especially if they are running with stringent filtering. See -link:https://issues.apache.org/jira/browse/HBASE-16973[Revisiting default value for hbase.client.scanner.caching]; +link:https://issues.apache.org/jira/browse/HBASE-16973[Revisiting default value for hbase.client.scanner.caching]; for further discussion. [[upgrade1.0.from.0.94]] http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/_chapters/zookeeper.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/zookeeper.adoc b/src/main/asciidoc/_chapters/zookeeper.adoc index 91577da..33eeadb 100644 --- a/src/main/asciidoc/_chapters/zookeeper.adoc +++ b/src/main/asciidoc/_chapters/zookeeper.adoc @@ -106,7 +106,7 @@ The newer version, the better. ZooKeeper 3.4.x is required as of HBase 1.0.0 .ZooKeeper Maintenance [CAUTION] ==== -Be sure to set up the data dir cleaner described under link:http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[ZooKeeper +Be sure to set up the data dir cleaner described under link:https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[ZooKeeper Maintenance] else you could have 'interesting' problems a couple of months in; i.e. zookeeper could start dropping sessions if it has to run through a directory of hundreds of thousands of logs which is wont to do around leader reelection time -- a process rare but run on occasion whether because a machine is dropped or happens to hiccup. ==== @@ -135,9 +135,9 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper Note that you can use HBase in this manner to spin up a ZooKeeper cluster, unrelated to HBase. Just make sure to set `HBASE_MANAGES_ZK` to `false` if you want it to stay up across HBase restarts so that when HBase shuts down, it doesn't take ZooKeeper down with it. -For more information about running a distinct ZooKeeper cluster, see the ZooKeeper link:http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html[Getting +For more information about running a distinct ZooKeeper cluster, see the ZooKeeper link:https://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html[Getting Started Guide]. -Additionally, see the link:http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7[ZooKeeper Wiki] or the link:http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup[ZooKeeper +Additionally, see the link:https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7[ZooKeeper Wiki] or the link:https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup[ZooKeeper documentation] for more information on ZooKeeper sizing. [[zk.sasl.auth]] @@ -181,7 +181,7 @@ We'll refer to this JAAS configuration file as _$CLIENT_CONF_ below. === HBase-managed ZooKeeper Configuration -On each node that will run a zookeeper, a master, or a regionserver, create a link:http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html[JAAS] configuration file in the conf directory of the node's _HBASE_HOME_ directory that looks like the following: +On each node that will run a zookeeper, a master, or a regionserver, create a link:http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html[JAAS] configuration file in the conf directory of the node's _HBASE_HOME_ directory that looks like the following: [source,java] ---- http://git-wip-us.apache.org/repos/asf/hbase/blob/2e9a55be/src/main/asciidoc/book.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc index e5898d5..1bc9ed7 100644 --- a/src/main/asciidoc/book.adoc +++ b/src/main/asciidoc/book.adoc @@ -19,14 +19,14 @@ */ //// -= Apache HBase (TM) Reference Guide += Apache HBase (TM) Reference Guide :Author: Apache HBase Team :Email: <hbase-...@lists.apache.org> :doctype: book :Version: {docVersion} :revnumber: {docVersion} // Logo for PDF -- doesn't render in HTML -:title-logo: hbase_logo_with_orca.png +:title-logo-image: image:hbase_logo_with_orca.png[pdfwidth=4.25in,align=center] :numbered: :toc: left :toclevels: 1 @@ -42,7 +42,7 @@ // Logo for HTML -- doesn't render in PDF ++++ <div> - <a href="http://hbase.apache.org"><img src="images/hbase_logo_with_orca.png" alt="Apache HBase Logo" /></a> + <a href="https://hbase.apache.org"><img src="images/hbase_logo_with_orca.png" alt="Apache HBase Logo" /></a> </div> ++++ @@ -62,6 +62,7 @@ include::_chapters/mapreduce.adoc[] include::_chapters/security.adoc[] include::_chapters/architecture.adoc[] include::_chapters/hbase_mob.adoc[] +include::_chapters/backup_restore.adoc[] include::_chapters/hbase_apis.adoc[] include::_chapters/external_apis.adoc[] include::_chapters/thrift_filter_language.adoc[] @@ -93,5 +94,3 @@ include::_chapters/asf.adoc[] include::_chapters/orca.adoc[] include::_chapters/tracing.adoc[] include::_chapters/rpc.adoc[] - -