HBASE-13907 Document how to deploy a coprocessor
Project: http://git-wip-us.apache.org/repos/asf/hbase/repo Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/f8eab44d Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/f8eab44d Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/f8eab44d Branch: refs/heads/hbase-12439 Commit: f8eab44dcd0d15ed5a4bf039c382f73468709a33 Parents: 7a4590d Author: Misty Stanley-Jones <mstanleyjo...@cloudera.com> Authored: Tue Jun 16 14:13:00 2015 +1000 Committer: Misty Stanley-Jones <mstanleyjo...@cloudera.com> Committed: Fri Dec 18 08:35:50 2015 -0800 ---------------------------------------------------------------------- src/main/asciidoc/_chapters/cp.adoc | 707 +++++++++++++++---------------- 1 file changed, 338 insertions(+), 369 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hbase/blob/f8eab44d/src/main/asciidoc/_chapters/cp.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/cp.adoc b/src/main/asciidoc/_chapters/cp.adoc index a4587ec..5f50b68 100644 --- a/src/main/asciidoc/_chapters/cp.adoc +++ b/src/main/asciidoc/_chapters/cp.adoc @@ -27,251 +27,209 @@ :icons: font :experimental: -HBase Coprocessors are modeled after the Coprocessors which are part of Google's BigTable -(http://research.google.com/people/jeff/SOCC2010-keynote-slides.pdf pages 41-42.). + -Coprocessor is a framework that provides an easy way to run your custom code directly on -Region Server. -The information in this chapter is primarily sourced and heavily reused from: +HBase Coprocessors are modeled after Google BigTable's coprocessor implementation +(http://research.google.com/people/jeff/SOCC2010-keynote-slides.pdf pages 41-42.). + +The coprocessor framework provides mechanisms for running your custom code directly on +the RegionServers managing your data. Efforts are ongoing to bridge gaps between HBase's +implementation and BigTable's architecture. For more information see +link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047]. + +The information in this chapter is primarily sourced and heavily reused from the following +resources: . Mingjie Lai's blog post link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor Introduction]. . Gaurav Bhardwaj's blog post link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of HBase Coprocessors]. +[WARNING] +.Use Coprocessors At Your Own Risk +==== +Coprocessors are an advanced feature of HBase and are intended to be used by system +developers only. Because coprocessor code runs directly on the RegionServer and has +direct access to your data, they introduce the risk of data corruption, man-in-the-middle +attacks, or other malicious data access. Currently, there is no mechanism to prevent +data corruption by coprocessors, though work is underway on +link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047]. ++ +In addition, there is no resource isolation, so a well-intentioned but misbehaving +coprocessor can severely degrade cluster performance and stability. +==== +== Coprocessor Overview -== Coprocessor Framework - -When working with any data store (like RDBMS or HBase) you fetch the data (in case of RDBMS you -might use SQL query and in case of HBase you use either Get or Scan). To fetch only relevant data -you filter it (for RDBMS you put conditions in 'WHERE' predicate and in HBase you use -link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]). -After fetching the desired data, you perform your business computation on the data. -This scenario is close to ideal for "small data", where few thousand rows and a bunch of columns -are returned from the data store. Now imagine a scenario where there are billions of rows and -millions of columns and you want to perform some computation which requires all the data, like -calculating average or sum. Even if you are interested in just few columns, you still have to -fetch all the rows. There are a few drawbacks in this approach as described below: - -. In this approach the data transfer (from data store to client side) will become the bottleneck, -and the time required to complete the operation is limited by the rate at which data transfer -takes place. -. It's not always possible to hold so much data in memory and perform computation. -. Bandwidth is one of the most precious resources in any data center. Operations like this may -saturate your data centerâs bandwidth and will severely impact the performance of your cluster. -. Your client code is becoming thick as you are maintaining the code for calculating average or -summation on client side. Not a major drawback when talking of severe issues like -performance/bandwidth but still worth giving consideration. - -In a scenario like this it's better to move the computation (i.e. user's custom code) to the data -itself (Region Server). Coprocessor helps you achieve this but you can do more than that. -There is another advantage that your code runs in parallel (i.e. on all Regions). -To give an idea of Coprocessor's capabilities, different people give different analogies. -The three most famous analogies for Coprocessor are: -[[cp_analogies]] -Triggers and Stored Procedure:: This is the most common analogy for Coprocessor. Observer -Coprocessor is compared to triggers because like triggers they execute your custom code when -certain event occurs (like Get or Put etc.). Similarly Endpoints Coprocessor is compared to the -stored procedures and you can perform custom computation on data directly inside the region server. +In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL +query. In order to fetch only the relevant data, you filter it using a HBase +link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter] +, whereas in an RDBMS you use a `WHERE` predicate. -MapReduce:: As in MapReduce you move the computation to the data in the same way. Coprocessor -executes your custom computation directly on Region Servers, i.e. where data resides. That's why -some people compare Coprocessor to a small MapReduce jobs. +After fetching the data, you perform computations on it. This paradigm works well +for "small data" with a few thousand rows and several columns. However, when you scale +to billions of rows and millions of columns, moving large amounts of data across your +network will create bottlenecks at the network layer, and the client needs to be powerful +enough and have enough memory to handle the large amounts of data and the computations. +In addition, the client code can grow large and complex. -AOP:: Some people compare it to _Aspect Oriented Programming_ (AOP). As in AOP, you apply advice -(on occurrence of specific event) by intercepting the request and then running some custom code -(probably cross-cutting concerns) and then forwarding the request on its path as if nothing -happened (or even return it back). Similarly in Coprocessor you have this facility of intercepting -the request and running custom code and then forwarding it on its path (or returning it). +In this scenario, coprocessors might make sense. You can put the business computation +code into a coprocessor which runs on the RegionServer, in the same location as the +data, and returns the result to the client. +This is only one scenario where using coprocessors can provide benefit. Following +are some analogies which may help to explain some of the benefits of coprocessors. -Although Coprocessor derives its roots from Google's Bigtable but it deviates from it largely in -its design. Currently there are efforts going on to bridge this gap. For more information see -link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047]. +[[cp_analogies]] +=== Coprocessor Analogies -In HBase, to implement a Coprocessor certain steps must be followed as described below: +Triggers and Stored Procedure:: + An Observer coprocessor is similar to a trigger in a RDBMS in that it executes + your code either before or after a specific event (such as a `Get` or `Put`) + occurs. An endpoint coprocessor is similar to a stored procedure in a RDBMS + because it allows you to perform custom computations on the data on the + RegionServer itself, rather than on the client. -. Either your class should extend one of the Coprocessor classes (like -// Below URL is more than 100 characters long. -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver] -) or it should implement Coprocessor interfaces (like -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/Coprocessor.html[Coprocessor], -// Below URL is more than 100 characters long. -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/CoprocessorService.html[CoprocessorService]). +MapReduce:: + MapReduce operates on the principle of moving the computation to the location of + the data. Coprocessors operate on the same principal. -. Load the Coprocessor: Currently there are two ways to load the Coprocessor. + -Static:: Loading from configuration -Dynamic:: Loading via 'hbase shell' or via Java code using HTableDescriptor class). + -For more details see <<cp_loading,Loading Coprocessors>>. +AOP:: + If you are familiar with Aspect Oriented Programming (AOP), you can think of a coprocessor + as applying advice by intercepting a request and then running some custom code, + before passing the request on to its final destination (or even changing the destination). -. Finally your client-side code to call the Coprocessor. This is the easiest step, as HBase -handles the Coprocessor transparently and you don't have to do much to call the Coprocessor. +=== Coprocessor Implementation Overview -The framework API is provided in the -// Below URL is more than 100 characters long. -link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor] -package. + -Coprocessors are not designed to be used by the end users but by developers. Coprocessors are -executed directly on region server; therefore a faulty/malicious code can bring your region server -down. Currently there is no mechanism to prevent this, but there are efforts going on for this. -For more, see link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047]. + -Two different types of Coprocessors are provided by the framework, based on their functionality. +. Either your class should extend one of the Coprocessor classes, such as +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver], +or it should implement the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/Coprocessor.html[Coprocessor] +or +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/CoprocessorService.html[CoprocessorService] +interface. +. Load the coprocessor, either statically (from the configuration) or dynamically, +using HBase Shell. For more details see <<cp_loading,Loading Coprocessors>>. +. Call the coprocessor from your client-side code. HBase handles the coprocessor +trapsparently. -== Types of Coprocessors +The framework API is provided in the +link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor] +package. -Coprocessor can be broadly divided into two categories: Observer and Endpoint. - -=== Observer -Observer Coprocessor are easy to understand. People coming from RDBMS background can compare them -to the triggers available in relational databases. Folks coming from programming background can -visualize it like advice (before and after only) available in AOP (Aspect Oriented Programming). -See <<cp_analogies, Coprocessor Analogy>> + -Coprocessors allows you to hook your custom code in two places during the life cycle of an event. + -First is just _before_ the occurrence of the event (just like 'before' advice in AOP or triggers -like 'before update'). All methods providing this kind feature will start with the prefix `pre`. + -For example if you want your custom code to get executed just before the `Put` operation, you can -use the override the -// Below URL is more than 100 characters long. -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`prePut`] -method of -// Below URL is more than 100 characters long. -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionCoprocessor]. -This method has following signature: -[source,java] ----- -public void prePut (final ObserverContext e, final Put put, final WALEdit edit,final Durability -durability) throws IOException; ----- +== Types of Coprocessors -Secondly, the Observer Coprocessor also provides hooks for your code to get executed just _after_ -the occurrence of the event (similar to after advice in AOP terminology or 'after update' triggers -). The methods giving this functionality will start with the prefix `post`. For example, if you -want your code to be executed after the 'Put' operation, you should consider overriding -// Below URL is more than 100 characters long. -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`postPut`] -method of -// Below URL is more than 100 characters long. -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionCoprocessor]: -[source,java] ----- -public void postPut(final ObserverContext e, final Put put, final WALEdit edit, final Durability -durability) throws IOException; ----- +=== Observer Coprocessors -In short, the following conventions are generally followed: + -Override _preXXX()_ method if you want your code to be executed just before the occurrence of the -event. + -Override _postXXX()_ method if you want your code to be executed just after the occurrence of the -event. + +Observer coprocessors are triggered either before or after a specific event occurs. +Observers that happen before an event use methods that start with a `pre` prefix, +such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`prePut`]. Observers that happen just after an event override methods that start +with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`postPut`]. -.Use Cases for Observer Coprocessors: -Few use cases of the Observer Coprocessor are: -. *Security*: Before performing any operation (like 'Get', 'Put') you can check for permission in -the 'preXXX' methods. +==== Use Cases for Observer Coprocessors +Security:: + Before performing a `Get` or `Put` operation, you can check for permission using + `preGet` or `prePut` methods. -. *Referential Integrity*: Unlike traditional RDBMS, HBase doesn't have the concept of referential -integrity (foreign key). Suppose for example you have a requirement that whenever you insert a -record in 'users' table, a corresponding entry should also be created in 'user_daily_attendance' -table. One way you could solve this is by using two 'Put' one for each table, this way you are -throwing the responsibility (of the referential integrity) to the user. A better way is to use -Coprocessor and overriding 'postPut' method in which you write the code to insert the record in -'user_daily_attendance' table. This way client code is more lean and clean. +Referential Integrity:: + HBase does not directly support the RDBMS concept of refential integrity, also known + as foreign keys. You can use a coprocessor to enforce such integrity. For instance, + if you have a business rule that every insert to the `users` table must be followed + by a corresponding entry in the `user_daily_attendance` table, you could implement + a coprocessor to use the `prePut` method on `user` to insert a record into `user_daily_attendance`. -. *Secondary Index*: Coprocessor can be used to maintain secondary indexes. For more information -see link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing]. +Secondary Indexes:: + You can use a coprocessor to maintain secondary indexes. For more information, see + link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing]. ==== Types of Observer Coprocessor -Observer Coprocessor comes in following flavors: - -. *RegionObserver*: This Coprocessor provides the facility to hook your code when the events on -region are triggered. Most common example include 'preGet' and 'postGet' for 'Get' operation and -'prePut' and 'postPut' for 'Put' operation. For exhaustive list of supported methods (events) see -// Below URL is more than 100 characters long. -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver]. - -. *Region Server Observer*: Provides hook for the events related to the RegionServer, such as -stopping the RegionServer and performing operations before or after merges, commits, or rollbacks. -For more details please refer -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html[RegionServerObserver]. - -. *Master Observer*: This observer provides hooks for DDL like operation, such as create, delete, -modify table. For entire list of available methods see -// Below URL is more than 100 characters long. -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html[MasterObserver]. - -. *WAL Observer*: Provides hooks for WAL (Write-Ahead-Log) related operation. It has only two -method 'preWALWrite()' and 'postWALWrite()'. For more details see -// Below URL is more than 100 characters long. -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver]. - -For example see <<cp_example,Examples>> +RegionObserver:: + A RegionObserver coprocessor allows you to observe events on a region, such as `Get` + and `Put` operations. See + link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver]. + Consider overriding the convenience class + link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver], + which implements the `RegionObserver` interface and will not break if new methods are added. + +RegionServerObserver:: + A RegionServerObserver allows you to observe events related to the RegionServer's + operation, such as starting, stopping, or performing merges, commits, or rollbacks. + See + link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html[RegionServerObserver]. + Consider overriding the convenience class + link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterRegionServerObserver.html[BaseMasterRegionServerObserver] + which implements both `MasterObserver` and `RegionServerObserver` interfaces and + will not break if new methods are added. + +MasterOvserver:: + A MasterObserver allows you to observe events related to the HBase Master, such + as table creation, deletion, or schema modification. See + link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html[MasterObserver]. + Consider overriding the convenience class + link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterRegionServerObserver.html[BaseMasterRegionServerObserver], + which implements both `MasterObserver` and `RegionServerObserver` interfaces and + will not break if new methods are added. + +WalObserver:: + A WalObserver allows you to observe events related to writes to the Write-Ahead + Log (WAL). See + link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver]. + Consider overriding the convenience class + link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseWALObserver.html[BaseWALObserver], + which implements the `WalObserver` interface and will not break if new methods are added. + +<<cp_example,Examples>> provides working examples of observer coprocessors. === Endpoint Coprocessor -Endpoint Coprocessor can be compared to stored procedure found in RDBMS. -See <<cp_analogies, Coprocessor Analogy>>. They help in performing computation which is not -possible either through Observer Coprocessor or otherwise. For example, calculating average or -summation over the entire table that spans across multiple regions. They do so by providing a hook -for your custom code and then running it across all regions. + -With Endpoints Coprocessor you can create your own dynamic RPC protocol and thus can provide -communication between client and region server, hence enabling you to run your custom code on -region server (on each region of a table). + -Unlike observer Coprocessor (where your custom code is -executed transparently when events like 'Get' operation occurs), in Endpoint Coprocessor you have -to explicitly invoke the Coprocessor by using the -// Below URL is more than 100 characters long. +Endpoint processors allow you to perform computation at the location of the data. +See <<cp_analogies, Coprocessor Analogy>>. An example is the need to calculate a running +average or summation for an entire table which spans hundreds of regions. + +In contract to observer coprocessors, where your code is run transparently, endpoint +coprocessors must be explicitly invoked using the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService%28java.lang.Class,%20byte%5B%5D,%20byte%5B%5D,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call%29[CoprocessorService()] method available in -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table] -(or -// Below URL is more than 100 characters long. -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTableInterface.html[HTableInterface] +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table], +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTableInterface.html[HTableInterface], or -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable]). +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable]. -From version 0.96, implementing Endpoint Coprocessor is not straight forward. Now it is done with -the help of Google's Protocol Buffer. For more details on Protocol Buffer, please see +Starting with HBase 0.96, endpoint coprocessors are implemented using Google Protocol +Buffers (protobuf). For more details on protobuf, see Google's link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide]. -Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later -(for more details, see -link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]), -so if you are upgrading your HBase cluster from version 0.94 (or before) to 0.96 (or later) you -have to rewrite your Endpoint coprocessor. - -For example see <<cp_example,Examples>> +Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later. +See +link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]). To upgrade your +HBase cluster from 0.94 or earlier to 0.96 or later, you need to reimplement your +coprocessor. +<<cp_example,Examples>> provides working examples of endpoint coprocessors. [[cp_loading]] == Loading Coprocessors -_Loading of Coprocessor refers to the process of making your custom Coprocessor implementation -available to HBase, so that when a request comes in or an event takes place the desired -functionality implemented in your custom code gets executed. + -Coprocessor can be loaded broadly in two ways. One is static (loading through configuration files) -and the other one is dynamic loading (using hbase shell or java code). +To make your coprocessor available to HBase, it must be _loaded_, either statically +(through the HBase configuration) or dynamically (using HBase Shell or the Java API). === Static Loading -Static loading means that your Coprocessor will take effect only when you restart your HBase and -there is a reason for it. In this you make changes 'hbase-site.xml' and therefore have to restart -HBase for your changes to take place. + -Following are the steps for loading Coprocessor statically. -. Define the Coprocessor in hbase-site.xml: Define a <property> element which consist of two -sub elements <name> and <value> respectively. +Follow these steps to statically load your coprocessor. Keep in mind that you must +restart HBase to unload a coprocessor that has been loaded statically. + +. Define the Coprocessor in _hbase-site.xml_, with a <property> element with a <name> +and a <value> sub-element. The <name> should be one of the following: + -.. <name> can have one of the following values: +- `hbase.coprocessor.region.classes` for RegionObservers and Endpoints. +- `hbase.coprocessor.wal.classes` for WALObservers. +- `hbase.coprocessor.master.classes` for MasterObservers. + -... 'hbase.coprocessor.region.classes' for RegionObservers and Endpoints. -... 'hbase.coprocessor.wal.classes' for WALObservers. -... 'hbase.coprocessor.master.classes' for MasterObservers. -.. <value> must contain the fully qualified class name of your class implementing the Coprocessor. +<value> must contain the fully-qualified class name of your coprocessor's implementation +class. + For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directory): @@ -283,6 +241,7 @@ following entry in RegionServer's 'hbase-site.xml' file (generally located under <value>org.myname.hbase.coprocessor.endpoint.SumEndPoint</value> </property> ---- ++ If multiple classes are specified for loading, the class names must be comma-separated. The framework attempts to load all the configured classes using the default class loader. Therefore, the jar file must reside on the server-side HBase classpath. @@ -297,34 +256,32 @@ When calling out to registered observers, the framework executes their callbacks sorted order of their priority. + Ties are broken arbitrarily. -. Put your code on classpath of HBase: There are various ways to do so, like adding jars on -classpath etc. One easy way to do this is to drop the jar (containing you code and all the -dependencies) in 'lib' folder of the HBase installation. - -. Restart the HBase. +. Put your code HBase's classpath. One easy way to do this is to drop the jar + (containing you code and all the dependencies) into the `lib/` directory in the + HBase installation. +. Restart HBase. -==== Unloading Static Coprocessor -Unloading static Coprocessor is easy. Following are the steps: -. Delete the Coprocessor's entry from the 'hbase-site.xml' i.e. remove the <property> tag. +=== Static Unloading -. Restart the Hbase. +. Delete the coprocessor's <property> element, including sub-elements, from `hbase-site.xml`. +. Restart HBase. +. Optionally, remove the coprocessor's JAR file from the classpath or HBase's `lib/` + directory. -. Optionally remove the Coprocessor jar file from the classpath (or from the lib directory if you -copied it over there). Removing the coprocessor JARs from HBaseâs classpath is a good practice. === Dynamic Loading -Dynamic loading refers to the process of loading Coprocessor without restarting HBase. This may -sound better than the static loading (and in some scenarios it may) but there is a caveat, dynamic -loaded Coprocessor applies to the table only for which it was loaded while same is not true for -static loading as it applies to all the tables. Due to this difference sometimes dynamically -loaded Coprocessor are also called *Table Coprocessor* (as they applies only to a single table) -while statically loaded Coprocessor are called *System Coprocessor* (as they applies to all the -tables). + -To dynamically load the Coprocessor you have to take the table offline hence during this time you -won't be able to process any request involving this table. + -There are three ways to dynamically load Coprocessor as shown below: + +You can also load a coprocessor dynamically, without restarting HBase. This may seem +preferable to static loading, but dynamically loaded coprocessors are loaded on a +per-table basis, and are only available to the table for which they were loaded. For +this reason, dynamically loaded tables are sometimes called *Table Coprocessor*. + +In addition, dynamically loading a coprocessor acts as a schema change on the table, +and the table must be taken offline to load the coprocessor. + +There are three ways to dynamically load Coprocessor. [NOTE] .Assumptions @@ -332,26 +289,25 @@ There are three ways to dynamically load Coprocessor as shown below: The below mentioned instructions makes the following assumptions: * A JAR called `coprocessor.jar` contains the Coprocessor implementation along with all of its -dependencies if any. +dependencies. * The JAR is available in HDFS in some location like `hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar`. ==== -. *Using Shell*: You can load the Coprocessor using the HBase shell as follows: -.. Disable Table: Take table offline by disabling it. Suppose if the table name is 'users', then -to disable it enter following command: +==== Using HBase Shell + +. Disable the table using HBase Shell: + [source] ---- -hbase(main):001:0> disable 'users' +hbase> disable 'users' ---- -.. Load the Coprocessor: The Coprocessor jar should be on HDFS and should be accessible to HBase, -to load the Coprocessor use following command: +. Load the Coprocessor, using a command like the following: + [source] ---- -hbase(main):002:0> alter 'users', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/ +hbase alter 'users', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/ user/<hadoop-user>/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823| arg1=1,arg2=2' ---- @@ -370,30 +326,25 @@ observers registered at the same hook using priorities. This field can be left b case the framework will assign a default priority value. * Arguments (Optional): This field is passed to the Coprocessor implementation. This is optional. -.. Enable the table: To enable table type following command: +. Enable the table. + ---- hbase(main):003:0> enable 'users' ---- -.. Verification: This is optional but generally good practice to see if your Coprocessor is -loaded successfully. Enter following command: + +. Verify that the coprocessor loaded: + ---- hbase(main):04:0> describe 'users' ---- + -You must see some output like this: -+ ----- -DESCRIPTION ENABLED -'users', {TABLE_ATTRIBUTES => {coprocessor$1 => true 'hdfs://<namenode>:<port>/user/<hadoop-user>/ -coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|'}, {NAME => -'personalDet'..... ----- +The coprocessor should be listed in the `TABLE_ATTRIBUTES`. +==== Using the Java API (all HBase versions) + +The following Java code shows how to use the `setValue()` method of `HTableDescriptor` +to load a coprocessor on the `users` table. -. *Using setValue()* method of HTableDescriptor: This is done entirely in Java as follows: -+ [source,java] ---- TableName tableName = TableName.valueOf("users"); @@ -416,9 +367,11 @@ admin.modifyTable(tableName, hTableDescriptor); admin.enableTable(tableName); ---- -. *Using addCoprocessor()* method of HTableDescriptor: This method is available from 0.96 version -onwards. -+ +==== Using the Java API (HBase 0.96+ only) + +In HBase 0.96 and newer, the `addCoprocessor()` method of `HTableDescriptor` provides +an easier way to load a coprocessor dynamically. + [source,java] ---- TableName tableName = TableName.valueOf("users"); @@ -439,26 +392,42 @@ admin.modifyTable(tableName, hTableDescriptor); admin.enableTable(tableName); ---- -==== WARNING: There is no guarantee that the framework will load a given Coprocessor successfully. For example, the shell command neither guarantees a jar file exists at a particular location nor verifies whether the given class is actually contained in the jar file. -==== -==== Unloading Dynamic Coprocessor -. Using shell: Run following command from HBase shell to remove Coprocessor from a table. +=== Dynamic Unloading + +==== Using HBase Shell + +. Disable the table. ++ +[source] +---- +hbase> disable 'users' +---- + +. Alter the table to remove the coprocessor. + [source] ---- -hbase(main):003:0> alter 'users', METHOD => 'table_att_unset', -hbase(main):004:0* NAME => 'coprocessor$1' +hbase> alter 'users', METHOD => 'table_att_unset', NAME => 'coprocessor$1' ---- -. Using HTableDescriptor: Simply reload the table definition _without_ setting the value of -Coprocessor either in setValue() or addCoprocessor() methods. This will remove the Coprocessor -attached to this table, if any. For example: +. Enable the table. + +[source] +---- +hbase> enable 'users' +---- + +==== Using the Java API + +Reload the table definition without setting the value of the coprocessor either by +using `setValue()` or `addCoprocessor()` methods. This will remove any coprocessor +attached to the table. + [source,java] ---- TableName tableName = TableName.valueOf("users"); @@ -477,26 +446,23 @@ hTableDescriptor.addFamily(columnFamily2); admin.modifyTable(tableName, hTableDescriptor); admin.enableTable(tableName); ---- -+ -Optionally you can also use removeCoprocessor() method of HTableDescriptor class. +In HBase 0.96 and newer, you can instead use the `removeCoprocessor()` method of the +`HTableDescriptor` class. [[cp_example]] == Examples -HBase ships Coprocessor examples for Observer Coprocessor see -// Below URL is more than 100 characters long. +HBase ships examples for Observer Coprocessor in link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/ZooKeeperScanPolicyObserver.html[ZooKeeperScanPolicyObserver] -and for Endpoint Coprocessor see -// Below URL is more than 100 characters long. +and for Endpoint Coprocessor in link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.html[RowCountEndpoint] A more detailed example is given below. -For the sake of example let's take an hypothetical case. Suppose there is a HBase table called -'users'. The table has two column families 'personalDet' and 'salaryDet' containing personal -details and salary details respectively. Below is the graphical representation of the 'users' -table. +These examples assume a table called `users`, which has two column families `personalDet` +and `salaryDet`, containing personal and salary details. Below is the graphical representation +of the `users` table. .Users Table [width="100%",cols="7",options="header,footer"] @@ -509,26 +475,22 @@ table. |==================== - === Observer Example -For the purpose of demonstration of Coprocessor we are assuming that 'admin' is a special person -and his details shouldn't be visible or returned to any client querying the 'users' table. + -To implement this functionality we will take the help of Observer Coprocessor. -Following are the implementation steps: + +The following Observer coprocessor prevents the details of the user `admin` from being +returned in a `Get` or `Scan` of the `users` table. . Write a class that extends the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver] class. -. Override the 'preGetOp()' method (Note that 'preGet()' method is now deprecated). The reason for -overriding this method is to check if the client has queried for the rowkey with value 'admin' or -not. If the client has queried rowkey with 'admin' value then return the call without allowing the -system to perform the get operation thus saving on performance, otherwise process the request as -normal. +. Override the `preGetOp()` method (the `preGet()` method is deprecated) to check +whether the client has queried for the rowkey with value `admin`. If so, return an +empty result. Otherwise, process the request as normal. -. Put your code and dependencies in the jar file. +. Put your code and dependencies in a JAR file. -. Place the jar in HDFS where HBase can locate it. +. Place the JAR in HDFS where HBase can locate it. . Load the Coprocessor. @@ -536,8 +498,7 @@ normal. Following are the implementation of the above steps: -. For Step 1 and Step 2, below is the code. -+ + [source,java] ---- public class RegionObserverExample extends BaseRegionObserver { @@ -568,10 +529,10 @@ public class RegionObserverExample extends BaseRegionObserver { } } ---- -Overriding the 'preGetOp()' will only work for 'Get' operation. For 'Scan' operation it won't help -you. To deal with it you have to override another method called 'preScannerOpen()' method, and -add a Filter explicitly for admin as shown below: -+ + +Overriding the `preGetOp()` will only work for `Get` operations. You also need to override +the `preScannerOpen()` method to filter the `admin` row from scan results. + [source,java] ---- @Override @@ -583,12 +544,11 @@ final RegionScanner s) throws IOException { return s; } ---- -+ -This method works but there is a _side effect_. If the client has used any Filter in his scan, -then that Filter won't have any effect because our filter has replaced it. + -Another option you can try is to deliberately remove the admin from result. This approach is -shown below: -+ + +This method works but there is a _side effect_. If the client has used a filter in +its scan, that filter will be replaced by this filter. Instead, you can explicitly +remove any `admin` results from the scan: + [source,java] ---- @Override @@ -597,9 +557,9 @@ final List results, final int limit, final boolean hasMore) throws IOException { Result result = null; Iterator iterator = results.iterator(); while (iterator.hasNext()) { - result = iterator.next(); + result = iterator.next(); if (Bytes.equals(result.getRow(), ROWKEY)) { - iterator.remove(); + iterator.remove(); break; } } @@ -607,76 +567,12 @@ final List results, final int limit, final boolean hasMore) throws IOException { } ---- -. Step 3: It's pretty convenient to export the above program in a jar file. Let's assume that was -exported in a file called 'coprocessor.jar'. - -. Step 4: Copy the jar to HDFS. You may use command like this: -+ -[source] ----- -hadoop fs -copyFromLocal coprocessor.jar coprocessor.jar ----- - -. Step 5: Load the Coprocessor, see <<cp_loading,Loading of Coprocessor>>. - -. Step 6: Run the following program to test. The first part is testing 'Get' and second 'Scan'. -+ -[source,java] ----- -Configuration conf = HBaseConfiguration.create(); -// Use below code for HBase version 1.x.x or above. -Connection connection = ConnectionFactory.createConnection(conf); -TableName tableName = TableName.valueOf("users"); -Table table = connection.getTable(tableName); - -//Use below code HBase version 0.98.xx or below. -//HConnection connection = HConnectionManager.createConnection(conf); -//HTableInterface table = connection.getTable("users"); - -Get get = new Get(Bytes.toBytes("admin")); -Result result = table.get(get); -for (Cell c : result.rawCells()) { - System.out.println(Bytes.toString(CellUtil.cloneRow(c)) - + "==> " + Bytes.toString(CellUtil.cloneFamily(c)) - + "{" + Bytes.toString(CellUtil.cloneQualifier(c)) - + ":" + Bytes.toLong(CellUtil.cloneValue(c)) + "}"); -} -Scan scan = new Scan(); -ResultScanner scanner = table.getScanner(scan); -for (Result res : scanner) { - for (Cell c : res.rawCells()) { - System.out.println(Bytes.toString(CellUtil.cloneRow(c)) - + " ==> " + Bytes.toString(CellUtil.cloneFamily(c)) - + " {" + Bytes.toString(CellUtil.cloneQualifier(c)) - + ":" + Bytes.toLong(CellUtil.cloneValue(c)) - + "}"); - } -} ----- - === Endpoint Example -In our hypothetical example (See Users Table), to demonstrate the Endpoint Coprocessor we see a -trivial use case in which we will try to calculate the total (Sum) of gross salary of all -employees. One way of implementing Endpoint Coprocessor (for version 0.96 and above) is as follows: +Still using the `users` table, this example implements a coprocessor to calculate +the sum of all employee salaries, using an endpoint coprocessor. . Create a '.proto' file defining your service. - -. Execute the 'protoc' command to generate the Java code from the above '.proto' file. - -. Write a class that should: -.. Extend the above generated service class. -.. It should also implement two interfaces Coprocessor and CoprocessorService. -.. Override the service method. - -. Load the Coprocessor. - -. Write a client code to call Coprocessor. - -Implementation detail of the above steps is as follows: - -. Step 1: Create a 'proto' file to define your service, request and response. Let's call this file -"sum.proto". Below is the content of the 'sum.proto' file. + [source] ---- @@ -700,26 +596,25 @@ service SumService { } ---- -. Step 2: Compile the proto file using proto compiler (for detailed instructions see the -link:https://developers.google.com/protocol-buffers/docs/overview[official documentation]). +. Execute the `protoc` command to generate the Java code from the above .proto' file. + [source] ---- +$ mkdir src $ protoc --java_out=src ./sum.proto ---- + -[note] ----- -(Note: It is necessary for you to create the src folder). -This will generate a class call "Sum.java". ----- +This will generate a class call `Sum.java`. -. Step 3: Write your Endpoint Coprocessor: Firstly your class should extend the service just -defined above (i.e. Sum.SumService). Second it should implement Coprocessor and CoprocessorService -interfaces. Third, override the 'getService()', 'start()', 'stop()' and 'getSum()' methods. -Below is the full code: +. Write a class that extends the generated service class, implement the `Coprocessor` +and `CoprocessorService` classes, and override the service method. + -[source,java] +WARNING: If you load a coprocessor from `hbase-site.xml` and then load the same coprocessor +again using HBase Shell, it will be loaded a second time. The same class will +exist twice, and the second instance will have a higher ID (and thus a lower priority). +The effect is that the duplicate coprocessor is effectively ignored. ++ +[source, java] ---- public class SumEndPoint extends SumService implements Coprocessor, CoprocessorService { @@ -779,15 +674,9 @@ public class SumEndPoint extends SumService implements Coprocessor, CoprocessorS } } ---- - -. Step 4: Load the Coprocessor. See <<cp_loading,loading of Coprocessor>>. - -. Step 5: Now we have to write the client code to test it. To do so in your main method, write the -following code as shown below: + -[source,java] +[source, java] ---- - Configuration conf = HBaseConfiguration.create(); // Use below code for HBase version 1.x.x or above. Connection connection = ConnectionFactory.createConnection(conf); @@ -821,6 +710,86 @@ e.printStackTrace(); } ---- +. Load the Coprocessor. + +. Write a client code to call the Coprocessor. + + +== Guidelines For Deploying A Coprocessor + +Bundling Coprocessors:: + You can bundle all classes for a coprocessor into a + single JAR on the RegionServer's classpath, for easy deployment. Otherwise, + place all dependencies on the RegionServer's classpath so that they can be + loaded during RegionServer start-up. The classpath for a RegionServer is set + in the RegionServer's `hbase-env.sh` file. +Automating Deployment:: + You can use a tool such as Puppet, Chef, or + Ansible to ship the JAR for the coprocessor to the required location on your + RegionServers' filesystems and restart each RegionServer, to automate + coprocessor deployment. Details for such set-ups are out of scope of this + document. +Updating a Coprocessor:: + Deploying a new version of a given coprocessor is not as simple as disabling it, + replacing the JAR, and re-enabling the coprocessor. This is because you cannot + reload a class in a JVM unless you delete all the current references to it. + Since the current JVM has reference to the existing coprocessor, you must restart + the JVM, by restarting the RegionServer, in order to replace it. This behavior + is not expected to change. +Coprocessor Logging:: + The Coprocessor framework does not provide an API for logging beyond standard Java + logging. +Coprocessor Configuration:: + If you do not want to load coprocessors from the HBase Shell, you can add their configuration + properties to `hbase-site.xml`. In <<load_coprocessor_in_shell>>, two arguments are + set: `arg1=1,arg2=2`. These could have been added to `hbase-site.xml` as follows: +[source,xml] +---- +<property> + <name>arg1</name> + <value>1</value> +</property> +<property> + <name>arg2</name> + <value>2</value> +</property> +---- +Then you can read the configuration using code like the following: +[source,java] +---- +Configuration conf = HBaseConfiguration.create(); +// Use below code for HBase version 1.x.x or above. +Connection connection = ConnectionFactory.createConnection(conf); +TableName tableName = TableName.valueOf("users"); +Table table = connection.getTable(tableName); + +//Use below code HBase version 0.98.xx or below. +//HConnection connection = HConnectionManager.createConnection(conf); +//HTableInterface table = connection.getTable("users"); + +Get get = new Get(Bytes.toBytes("admin")); +Result result = table.get(get); +for (Cell c : result.rawCells()) { + System.out.println(Bytes.toString(CellUtil.cloneRow(c)) + + "==> " + Bytes.toString(CellUtil.cloneFamily(c)) + + "{" + Bytes.toString(CellUtil.cloneQualifier(c)) + + ":" + Bytes.toLong(CellUtil.cloneValue(c)) + "}"); +} +Scan scan = new Scan(); +ResultScanner scanner = table.getScanner(scan); +for (Result res : scanner) { + for (Cell c : res.rawCells()) { + System.out.println(Bytes.toString(CellUtil.cloneRow(c)) + + " ==> " + Bytes.toString(CellUtil.cloneFamily(c)) + + " {" + Bytes.toString(CellUtil.cloneQualifier(c)) + + ":" + Bytes.toLong(CellUtil.cloneValue(c)) + + "}"); + } +} +---- + + + == Monitor Time Spent in Coprocessors