HBASE-20730 Add pv2 and amv2 chapters to refguide
Signed-off-by: Mike Drob <md...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/08254624
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/08254624
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/08254624

Branch: refs/heads/HBASE-20331
Commit: 082546243692a7a80760b88515fa6ec5c198f00c
Parents: 0e43abc
Author: Michael Stack <st...@apache.org>
Authored: Wed Jun 13 20:19:10 2018 -0700
Committer: Michael Stack <st...@apache.org>
Committed: Fri Jun 15 15:41:30 2018 -0700

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/amv2.adoc | 173 +++++++++++++++++++++++++++++
 src/main/asciidoc/_chapters/pv2.adoc  | 163 +++++++++++++++++++++++++++
 src/main/asciidoc/book.adoc           |   2 +
 3 files changed, 338 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/08254624/src/main/asciidoc/_chapters/amv2.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/amv2.adoc 
b/src/main/asciidoc/_chapters/amv2.adoc
new file mode 100644
index 0000000..49841ce
--- /dev/null
+++ b/src/main/asciidoc/_chapters/amv2.adoc
@@ -0,0 +1,173 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+[[amv2]]
+= AMv2 Description for Devs
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+The AssignmentManager (AM) in HBase Master manages assignment of Regions over 
a cluster of RegionServers.
+
+The AMv2 project is a redo of Assignment in an attempt at addressing the root 
cause of many of our operational issues in production, namely slow assignment 
and problematic accounting such that Regions are misplaced stuck offline in the 
notorious _Regions-In-Transition (RIT)_ limbo state.
+
+Below are notes for devs on key aspects of AMv2 in no particular order.
+
+== Background
+
+Assignment in HBase 1.x has been problematic in operation. It is not hard to 
see why. Region state is kept at the other end of an RPC in ZooKeeper (Terminal 
states -- i.e. OPEN or CLOSED -- are published to the _hbase:meta_ table). In 
HBase-1.x.x, state has multiple writers with Master and RegionServers all able 
to make state edits concurrently (in _hbase:meta_ table and out on ZooKeeper). 
If clocks are awry or watchers missed, state changes can be skipped or 
overwritten. Locking of HBase Entities -- tables, regions -- is not 
comprehensive so a table operation -- disable/enable -- could clash with a 
region-level operation; a split or merge. Region state is distributed and hard 
to reason about and test. Assignment is slow in operation because each assign 
involves moving remote znodes through transitions. Cluster size tends to top 
out at a couple of hundred thousand regions; beyond this, cluster start/stop 
takes hours and is prone to corruption.
+
+AMv2 (AssignmentManager Version 2) is a refactor 
(https://issues.apache.org/jira/browse/HBASE-14350[HBASE-14350]) of the 
hbase-1.x AssignmentManager putting it up on a 
https://issues.apache.org/jira/browse/HBASE-12439[ProcedureV2 (HBASE-12439)] 
basis. ProcedureV2 (Pv2)__,__ is an awkwardly named system that allows 
describing and running multi-step state machines. It is performant and persists 
all state to a Store which is recoverable post crash. See the companion chapter 
on <<pv2>>, to learn more about the ProcedureV2 system.
+
+In AMv2, all assignment, crash handling, splits and merges are recast as 
Procedures(v2).  ZooKeeper is purged from the mix. As before, the final 
assignment state gets published to _hbase:meta_ for non-Master participants to 
read (all-clients) with intermediate state kept in the local Pv2 WAL-based 
‘store’ but only the active Master, a single-writer, evolves state. The 
Master’s in-memory cluster image is the authority and if disagreement, 
RegionServers are forced to comply. Pv2 adds shared/exclusive locking of all 
core HBase Entities -- namespace, tables, and regions -- to ensure one actor at 
a time access and to prevent operations contending over resources (move/split, 
disable/assign, etc.).
+
+This redo of AM atop of a purposed, performant state machine with all 
operations taking on the common Procedure form with a single state writer only 
moves our AM to a new level of resilience and scale.
+
+== New System
+
+Each Region Assign or Unassign of a Region is now a Procedure. A Move (Region) 
Procedure is a compound of Procedures; it is the running of an Unassign 
Procedure followed by an Assign Procedure. The Move Procedure spawns the Assign 
and Unassign in series and then waits on their completions.
+
+And so on. ServerCrashProcedure spawns the WAL splitting tasks and then the 
reassign of all regions that were hosted on the crashed server as subprocedures.
+
+AMv2 Procedures are run by the Master in a ProcedureExecutor instance. All 
Procedures make use of utility provided by the Pv2 framework.
+
+For example, Procedures persist each state transition to the frameworks’ 
Procedure Store. The default implementation is done as a WAL kept on HDFS. On 
crash, we reopen the Store and rerun all WALs of Procedure transitions to put 
the Assignment State Machine back into the attitude it had just before crash. 
We then continue Procedure execution.
+
+In the new system, the Master is the Authority on all things Assign. Previous 
we were ambiguous; e.g. the RegionServer was in charge of Split operations. 
Master keeps an in-memory image of Region states and servers. If disagreement, 
the Master always prevails; at an extreme it will kill the RegionServer that is 
in disagreement.
+
+A new RegionStateStore class takes care of publishing the terminal Region 
state, whether OPEN or CLOSED, out to the _hbase:meta _table__.__
+
+RegionServers now report their run version on Connection. This version is 
available inside the AM for use running migrating rolling restarts.
+
+== Procedures Detail
+
+=== Assign/Unassign
+
+Assign and Unassign subclass a common RegionTransitionProcedure. There can 
only be one RegionTransitionProcedure per region running at a time since the 
RTP instance takes a lock on the region. The RTP base Procedure has three 
steps; a store the procedure step (REGION_TRANSITION_QUEUE); a dispatch of the 
procedure open or close followed by a suspend waiting on the remote 
regionserver to report successful open or fail (REGION_TRANSITION_DISPATCH) or 
notification that the server fielding the request crashed; and finally 
registration of the successful open/close in hbase:meta 
(REGION_TRANSITION_FINISH).
+
+Here is how the assign of a region 56f985a727afe80a184dac75fbf6860c looks in 
the logs. The assign was provoked by a Server Crash (Process ID 1176 or 
pid=1176 which when it is the parent of a procedure, it is identified as 
ppid=1176). The assign is pid=1179, the second region of the two being assigned 
by this Server Crash.
+
+[source]
+----
+2017-05-23 12:04:24,175 INFO  [ProcExecWrkr-30] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=1178, ppid=1176, 
state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=IntegrationTestBigLinkedList, region=bfd57f0b72fd3ca77e9d3c5e3ae48d76, 
target=ve0540.halxg.example.org,16020,1495525111232}, {pid=1179, ppid=1176, 
state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=IntegrationTestBigLinkedList, region=56f985a727afe80a184dac75fbf6860c, 
target=ve0540.halxg.example.org,16020,1495525111232}]
+----
+
+Next we start the assign by queuing (‘registering’) the Procedure with the 
framework.
+
+[source]
+----
+2017-05-23 12:04:24,241 INFO  [ProcExecWrkr-30] assignment.AssignProcedure: 
Start pid=1179, ppid=1176, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
AssignProcedure table=IntegrationTestBigLinkedList, 
region=56f985a727afe80a184dac75fbf6860c, 
target=ve0540.halxg.example.org,16020,1495525111232; rit=OFFLINE, 
location=ve0540.halxg.example.org,16020,1495525111232; forceNewPlan=false, 
retain=false
+----
+
+Track the running of Procedures in logs by tracing their process id -- here 
pid=1179.
+
+Next we move to the dispatch phase where we update hbase:meta table setting 
the region state as OPENING on server ve540. We then dispatch an rpc to ve540 
asking it to open the region. Thereafter we suspend the Assign until we get a 
message back from ve540 on whether it has opened the region successfully (or 
not).
+
+[source]
+----
+2017-05-23 12:04:24,494 INFO  [ProcExecWrkr-38] assignment.RegionStateStore: 
pid=1179 updating hbase:meta 
row=IntegrationTestBigLinkedList,H\xE3@\x8D\x964\x9D\xDF\x8F@9\x0F\xC8\xCC\xC2,1495566261066.56f985a727afe80a184dac75fbf6860c.,
 regionState=OPENING, 
regionLocation=ve0540.halxg.example.org,16020,1495525111232
+2017-05-23 12:04:24,498 INFO  [ProcExecWrkr-38] 
assignment.RegionTransitionProcedure: Dispatch pid=1179, ppid=1176, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=IntegrationTestBigLinkedList, region=56f985a727afe80a184dac75fbf6860c, 
target=ve0540.halxg.example.org,16020,1495525111232; rit=OPENING, 
location=ve0540.halxg.example.org,16020,1495525111232
+----
+
+Below we log the incoming report that the region opened successfully on ve540. 
The Procedure is woken up (you can tell it the procedure is running by the name 
of the thread, its a ProcedureExecutor thread, ProcExecWrkr-9).  The woken up 
Procedure updates state in hbase:meta to denote the region as open on ve0540. 
It then reports finished and exits.
+
+[source]
+----
+2017-05-23 12:04:26,643 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=46,queue=1,port=16000] 
assignment.RegionTransitionProcedure: Received report OPENED seqId=11984985, 
pid=1179, ppid=1176, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=IntegrationTestBigLinkedList, region=56f985a727afe80a184dac75fbf6860c, 
target=ve0540.halxg.example.org,16020,1495525111232; rit=OPENING, 
location=ve0540.halxg.example.org,16020,1495525111232                           
                                                                                
                                                            2017-05-23 
12:04:26,643 INFO  [ProcExecWrkr-9] assignment.RegionStateStore: pid=1179 
updating hbase:meta 
row=IntegrationTestBigLinkedList,H\xE3@\x8D\x964\x9D\xDF\x8F@9\x0F\xC8\xCC\xC2,1495566261066.56f985a727afe80a184dac75fbf6860c.,
 regionState=OPEN, openSeqNum=11984985, 
regionLocation=ve0540.halxg.example.org,16020,1495525111232
+2017-05-23 12:04:26,836 INFO  [ProcExecWrkr-9] procedure2.ProcedureExecutor: 
Finish suprocedure pid=1179, ppid=1176, state=SUCCESS; AssignProcedure 
table=IntegrationTestBigLinkedList, region=56f985a727afe80a184dac75fbf6860c, 
target=ve0540.halxg.example.org,16020,1495525111232
+----
+Unassign looks similar given it is based on the base 
RegionTransitionProcedure. It has the same state transitions and does basically 
the same steps but with different state name (CLOSING, CLOSED).
+
+Most other procedures are subclasses of a Pv2 StateMachine implementation. We 
have both Table and Region focused StateMachines types.
+
+== UI
+
+Along the top-bar on the Master, you can now find a ‘Procedures&Locks’ tab 
which takes you to a page that is ugly but useful. It dumps currently running 
procedures and framework locks. Look at this when you can’t figure what stuff 
is stuck; it will at least identify problematic procedures (take the pid and 
grep the logs…). Look for ROLLEDBACK or pids that have been RUNNING for a 
long time.
+
+== Logging
+
+Procedures log their process ids as pid= and their parent ids (ppid=) 
everywhere. Work has been done so you can grep the pid and see history of a 
procedure operation.
+
+== Implementation Notes
+
+In this section we note some idiosyncrasies of operation as an attempt at 
saving you some head-scratching.
+
+=== Region Transition RPC and RS Heartbeat can arrive at ~same time on Master
+
+Reporting Region Transition on a RegionServer is now a RPC distinct from RS 
heartbeating (‘RegionServerServices’ Service). An heartbeat and a status 
update can arrive at the Master at about the same time. The Master will update 
its internal state for a Region but this same state is checked when heartbeat 
processing. We may find the unexpected; i.e. a Region just reported as CLOSED 
so heartbeat is surprised to find region OPEN on the back of the RS report. In 
the new system, all slaves must cow to the Masters’ understanding of cluster 
state; the Master will kill/close any misaligned entities.
+
+To address the above, we added a lastUpdate for in-memory Master state. Let a 
region state have some vintage before we act on it (one second currently).
+
+=== Master as RegionServer or as RegionServer that just does system tables
+
+AMv2 enforces current master branch default of HMaster carrying system tables 
only; i.e. the Master in an HBase cluster acts also as a RegionServer only it 
is the exclusive host for tables such as _hbase:meta_, _hbase:namespace_, etc., 
the core system tables. This is causing a couple of test failures as AMv1, 
though it is not supposed to, allows moving hbase:meta off Master while AMv2 
does not.
+
+== New Configs
+
+These configs all need doc on when you’d change them.
+
+=== hbase.procedure.remote.dispatcher.threadpool.size
+
+Defaults 128
+
+=== hbase.procedure.remote.dispatcher.delay.msec
+
+Default 150ms
+
+=== hbase.procedure.remote.dispatcher.max.queue.size
+
+Default 32
+
+=== hbase.regionserver.rpc.startup.waittime
+
+Default 60 seconds.
+
+== Tools
+
+HBASE-15592 Print Procedure WAL Content
+
+Patch in https://issues.apache.org/jira/browse/HBASE-18152[HBASE-18152] [AMv2] 
Corrupt Procedure WAL file; procedure data stored out of order 
https://issues.apache.org/jira/secure/attachment/12871066/reading_bad_wal.patch[https://issues.apache.org/jira/secure/attachment/12871066/reading_bad_wal.patch]
+
+=== MasterProcedureSchedulerPerformanceEvaluation
+
+Tool to test performance of locks and queues in procedure scheduler 
independently from other framework components. Run this after any substantial 
changes in proc system. Prints nice output:
+
+----
+******************************************
+Time - addBack     : 5.0600sec
+Ops/sec - addBack  : 1.9M
+Time - poll        : 19.4590sec
+Ops/sec - poll     : 501.9K
+Num Operations     : 10000000
+
+Completed          : 10000006
+Yield              : 22025876
+
+Num Tables         : 5
+Regions per table  : 10
+Operations type    : both
+Threads            : 10
+******************************************
+Raw format for scripts
+
+RESULT [num_ops=10000000, ops_type=both, num_table=5, regions_per_table=10, 
threads=10, num_yield=22025876, time_addback_ms=5060, time_poll_ms=19459]
+----

http://git-wip-us.apache.org/repos/asf/hbase/blob/08254624/src/main/asciidoc/_chapters/pv2.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/pv2.adoc 
b/src/main/asciidoc/_chapters/pv2.adoc
new file mode 100644
index 0000000..5ecad3f
--- /dev/null
+++ b/src/main/asciidoc/_chapters/pv2.adoc
@@ -0,0 +1,163 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+[[pv2]]
+= Procedure Framework (Pv2): 
link:https://issues.apache.org/jira/browse/HBASE-12439[HBASE-12439]
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+
+_Procedure v2 ...aims to provide a unified way to build...multi-step 
procedures with a rollback/roll-forward ability in case of failure (e.g. 
create/delete table) -- Matteo Bertozzi, the author of Pv2._
+
+With Pv2 you can build and run state machines. It was built by Matteo to make 
distributed state transitions in HBase resilient in the face of process 
failures. Previous to Pv2, state transition handling was spread about the 
codebase with implementation varying by transition-type and context. Pv2 was 
inspired by 
link:https://accumulo.apache.org/1.8/accumulo_user_manual.html#_fault_tolerant_executor_fate[FATE],
 of Apache Accumulo. +
+
+Early Pv2 aspects have been shipping in HBase with a good while now but it has 
continued to evolve as it takes on more involved scenarios. What we have now is 
powerful but intricate in operation and incomplete, in need of cleanup and 
hardening. In this doc we have given overview on the system so you can make use 
of it (and help with its polishing).
+
+This system has the awkward name of Pv2 because HBase already had the notion 
of a Procedure used in snapshots (see hbase-server 
_org.apache.hadoop.hbase.procedure_ as opposed to hbase-procedure 
_org.apache.hadoop.hbase.procedure2_). Pv2 supercedes and is to replace 
Procedure.
+
+== Procedures
+
+A Procedure is a transform made on an HBase entity. Examples of HBase entities 
would be Regions and Tables. +
+Procedures are run by a ProcedureExecutor instance. Procedure current state is 
kept in the ProcedureStore. +
+The ProcedureExecutor has but a primitive view on what goes on inside a 
Procedure. From its PoV, Procedures are submitted and then the 
ProcedureExecutor keeps calling _#execute(Object)_ until the Procedure is done. 
Execute may be called multiple times in the case of failure or restart, so 
Procedure code must be idempotent yielding the same result each time it run. 
Procedure code can also implement _rollback_ so steps can be undone if failure. 
A call to _execute()_ can result in one of following possibilities:
+
+* _execute()_ returns
+** _null_: indicates we are done.
+** _this_: indicates there is more to do so, persist current procedure state 
and re-_execute()_.
+** _Array_ of sub-procedures: indicates a set of procedures needed to be run 
to completion before we can proceed (after which we expect the framework to 
call our execute again).
+* _execute()_ throws exception
+** _suspend_: indicates execution of procedure is suspended and can be resumed 
due to some external event. The procedure state is persisted.
+** _yield_: procedure is added back to scheduler. The procedure state is not 
persisted.
+** _interrupted_: currently same as _yield_.
+** Any _exception_ not listed above: Procedure _state_ is changed to _FAILED_ 
(after which we expect the framework will attempt rollback).
+
+The ProcedureExecutor stamps the frameworks notions of Procedure State into 
the Procedure itself; e.g. it marks Procedures as INITIALIZING on submit. It 
moves the state to RUNNABLE when it goes to execute. When done, a Procedure 
gets marked FAILED or SUCCESS depending. Here is the list of all states as of 
this writing:
+
+* *_INITIALIZING_* Procedure in construction, not yet added to the executor
+* *_RUNNABLE_* Procedure added to the executor, and ready to be executed.
+* *_WAITING_* The procedure is waiting on children (subprocedures) to be 
completed
+* *_WAITING_TIMEOUT_* The procedure is waiting a timeout or an external event
+* *_ROLLEDBACK_* The procedure failed and was rolledback.
+* *_SUCCESS_* The procedure execution completed successfully.
+* *_FAILED_* The procedure execution failed, may need to rollback.
+
+After each execute, the Procedure state is persisted to the ProcedureStore. 
Hooks are invoked on Procedures so they can preserve custom state. Post-fault, 
the ProcedureExecutor re-hydrates its pre-crash state by replaying the content 
of the ProcedureStore. This makes the Procedure Framework resilient against 
process failure.
+
+=== Implementation
+
+In implementation, Procedures tend to divide transforms into finer-grained 
tasks and while some of these work items are handed off to sub-procedures,
+the bulk are done as processing _steps_ in-Procedure; each invocation of the 
execute is used to perform a single step, and then the Procedure relinquishes 
returning to the framework. The Procedure does its own tracking of where it is 
in the processing.
+
+What comprises a sub-task, or _step_ in the execution is up to the Procedure 
author but generally it is a small piece of work that cannot be further 
decomposed and that moves the processing forward toward its end state. Having 
procedures made of many small steps rather than a few large ones allows the 
Procedure framework give out insight on where we are in the processing. It also 
allows the framework be more fair in its execution. As stated per above, each 
step may be called multiple times (failure/restart) so steps must be 
implemented idempotent. +
+It is easy to confuse the state that the Procedure itself is keeping with that 
of the Framework itself. Try to keep them distinct. +
+
+=== Rollback
+
+Rollback is called when the procedure or one of the sub-procedures has failed. 
The rollback step is supposed to cleanup the resources created during the 
execute() step. In case of failure and restart, rollback() may be called 
multiple times, so again the code must be idempotent.
+
+=== Metrics
+
+There are hooks for collecting metrics on submit of the procedure and on 
finish.
+
+* updateMetricsOnSubmit()
+* updateMetricsOnFinish()
+
+Individual procedures can override these methods to collect procedure specific 
metrics. The default implementations of these methods  try to get an object 
implementing an interface ProcedureMetrics which encapsulates following set of 
generic metrics:
+
+* SubmittedCount (Counter): Total number of procedure instances submitted of a 
type.
+* Time (Histogram): Histogram of runtime for procedure instances.
+* FailedCount (Counter): Total number of failed procedure instances.
+
+Individual procedures can implement this object and define these generic set 
of metrics.
+
+=== Baggage
+
+Procedures can carry baggage. One example is the _step_ the procedure last 
attained (see previous section); procedures persist the enum that marks where 
they are currently. Other examples might be the Region or Server name the 
Procedure is currently working. After each call to execute, the 
Procedure#serializeStateData is called. Procedures can persist whatever.
+
+=== Result/State and Queries
+
+(From Matteo’s 
https://issues.apache.org/jira/secure/attachment/12693273/Procedurev2Notification-Bus.pdf[ProcedureV2
 and Notification Bus] doc) +
+In the case of asynchronous operations, the result must be kept around until 
the client asks for it. Once we receive a “get” of the result we can 
schedule the delete of the record. For some operations the result may be 
“unnecessary” especially in case of failure (e.g. if the create table fail, 
we can query the operation result or we can just do a list table to see if it 
was created) so in some cases we can schedule the delete after a timeout. On 
the client side the operation will return a “Procedure ID”, this ID can be 
used to wait until the procedure is completed and get the result/exception. +
+
+[source]
+----
+Admin.doOperation() { longprocId=master.doOperation(); 
master.waitCompletion(procId); }  +
+----
+
+If the master goes down while performing the operation the backup master will 
pickup the half in­progress operation and complete it. The client will not 
notice the failure.
+
+== Subprocedures
+
+Subprocedures are _Procedure_ instances created and returned by 
_#execute(Object)_ method of a procedure instance (parent procedure). As 
subprocedures are of type _Procedure_, they can instantiate their own 
subprocedures. As its a recursive, procedure stack is maintained by the 
framework. The framework makes sure that the parent procedure does not proceed 
till all sub-procedures and their subprocedures in a procedure stack are 
successfully finished.
+
+== ProcedureExecutor
+
+_ProcedureExecutor_ uses _ProcedureStore_ and _ProcedureScheduler_ and 
executes procedures submitted to it. Some of the basic operations supported are:
+
+* _abort(procId)_: aborts specified procedure if its not finished
+* _submit(Procedure)_: submits procedure for execution
+* _retrieve:_ list of get methods to get _Procedure_ instances and results
+* _register/ unregister_ listeners: for listening on Procedure related 
notifications
+
+When _ProcedureExecutor_ starts it loads procedure instances persisted in 
_ProcedureStore_ from previous run. All unfinished procedures are resumed from 
the last stored state.
+
+== Nonces
+
+You can pass the nonce that came in with the RPC to the Procedure on submit at 
the executor. This nonce will then be serialized along w/ the Procedure on 
persist. If a crash, on reload, the nonce will be put back into a map of nonces 
to pid in case a client tries to run same procedure for a second time (it will 
be rejected). See the base Procedure and how nonce is a base data member.
+
+== Wait/Wake/Suspend/Yield
+
+‘suspend’ means stop processing a procedure because we can make no more 
progress until a condition changes; i.e. we sent RPC and need to wait on 
response. The way this works is that a Procedure throws a suspend exception 
from down in its guts as a GOTO the end-of-the-current-processing step. Suspend 
also puts the Procedure back on the scheduler. Problematic is we do some 
accounting on our way out even on suspend making it so it can take time exiting 
(We have to update state in the WAL).
+
+RegionTransitionProcedure#reportTransition is called on receipt of a report 
from a RS. For Assign and Unassign, this event response from the server we sent 
an RPC wakes up suspended Assign/Unassigns.
+
+== Locking
+
+Procedure Locks are not about concurrency! They are about giving a Procedure 
read/write access to an HBase Entity such as a Table or Region so that is 
possible to shut out other Procedures from making modifications to an HBase 
Entity state while the current one is running.
+
+Locking is optional, up to the Procedure implementor but if an entity is being 
operated on by a Procedure, all transforms need to be done via Procedures using 
the same locking scheme else havoc.
+
+Two ProcedureExecutor Worker threads can actually end up both processing the 
same Procedure instance. If it happens, the threads are meant to be running 
different parts of the one Procedure -- changes that do not stamp on each other 
(This gets awkward around the procedure frameworks notion of ‘suspend’. 
More on this below).
+
+Locks optionally may be held for the life of a Procedure. For example, if 
moving a Region, you probably want to have exclusive access to the HBase Region 
until the Region completes (or fails).  This is used in conjunction with {@link 
#holdLock(Object)}. If {@link #holdLock(Object)} returns true, the procedure 
executor will call acquireLock() once and thereafter not call {@link 
#releaseLock(Object)} until the Procedure is done (Normally, it calls 
release/acquire around each invocation of {@link #execute(Object)}.
+
+Locks also may live the life of a procedure; i.e. once an Assign Procedure 
starts, we do not want another procedure meddling w/ the region under 
assignment. Procedures that hold the lock for the life of the procedure set 
Procedure#holdLock to true. AssignProcedure does this as do Split and Move (If 
in the middle of a Region move, you do not want it Splitting).
+
+Locking can be for life of Procedure.
+
+Some locks have a hierarchy. For example, taking a region lock also takes 
(read) lock on its containing table and namespace to prevent another Procedure 
obtaining an exclusive lock on the hosting table (or namespace).
+
+== Procedure Types
+
+=== StateMachineProcedure
+
+One can consider each call to _#execute(Object)_ method as transitioning from 
one state to another in a state machine. Abstract class _StateMachineProcedure_ 
is wrapper around base _Procedure_ class which provides constructs for 
implementing a state machine as a _Procedure_. After each state transition 
current state is persisted so that, in case of crash/ restart, the state 
transition can be resumed from the previous state of a procedure before crash/ 
restart. Individual procedures need to define initial and terminus states and 
hooks _executeFromState()_ and _setNextState()_ are provided for state 
transitions.
+
+=== RemoteProcedureDispatcher
+
+A new RemoteProcedureDispatcher (+ subclass RSProcedureDispatcher) primitive 
takes care of running the Procedure-based Assignments ‘remote’ component. 
This dispatcher knows about ‘servers’. It does aggregation of assignments 
by time on a time/count basis so can send procedures in batches rather than one 
per RPC. Procedure status comes back on the back of the RegionServer heartbeat 
reporting online/offline regions (No more notifications via ZK). The response 
is passed to the AMv2 to ‘process’. It will check against the in-memory 
state. If there is a mismatch, it fences out the RegionServer on the assumption 
that something went wrong on the RS side. Timeouts trigger retries (Not Yet 
Implemented!). The Procedure machine ensures only one operation at a time on 
any one Region/Table using entity _locking_ and smarts about what is serial and 
what can be run concurrently (Locking was zk-based -- you’d put a znode in zk 
for a table -- but now has been converted to be procedure-
 based as part of this project).
+
+== References
+
+* Matteo had a slide deck on what it the Procedure Framework would look like 
and the problems it addresses initially 
link:https://issues.apache.org/jira/secure/attachment/12845124/ProcedureV2b.pdf[attached
 to the Pv2 issue.]
+* 
link:https://issues.apache.org/jira/secure/attachment/12693273/Procedurev2Notification-Bus.pdf[A
 good doc by Matteo] on problem and how Pv2 addresses it w/ roadmap (from the 
Pv2 JIRA). We should go back to the roadmap to do the Notification Bus, 
convertion of log splitting to Pv2, etc.

http://git-wip-us.apache.org/repos/asf/hbase/blob/08254624/src/main/asciidoc/book.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc
index c05eed3..84526d9 100644
--- a/src/main/asciidoc/book.adoc
+++ b/src/main/asciidoc/book.adoc
@@ -76,6 +76,8 @@ include::_chapters/ops_mgt.adoc[]
 include::_chapters/developer.adoc[]
 include::_chapters/unit_testing.adoc[]
 include::_chapters/protobuf.adoc[]
+include::_chapters/pv2.adoc[]
+include::_chapters/amv2.adoc[]
 include::_chapters/zookeeper.adoc[]
 include::_chapters/community.adoc[]
 

Reply via email to