These slides have some info. http://people.apache.org/~kturner/accumulo14_15.pdf
FATE makes it very easy to write fault tolerant multi-step Accumulo operations, like create table and delete table. Create table has to modify zookeeper and the Accumulo metadata table. Its many steps. If there is a machine failure at any step, it could leave Accumulo in an inconsistent state. An operation like create table is broken into lots of little FATE operations. FATE operations are called Repo's (repeatable persisted operation). These operation must be written in such a way that its safe to call them one or more times. A repo is submitted to FATE. FATE pushes the repo into zookeeper before executing it. Then FATE executes the repo, if the repo returns another repo it pushes that into zookeeper and then executes it. If a repo does not return another repo, it consideres the FATE transaction done. If a repo throws an exeception, then it pops all of the repo operation for the transaction out of zookeeper calling undo on each one. A repo has an isReady method, this is provided so a FATE operation can wait for a condition on the cluster to become true w/o waiting tying up a thread. If a repo is not ready then it will not be executed, later it will be read from Zookeeper and isReady called again. Clients submit FATE operations by first requesting a FATE transaction id (txid). If the server fails while the client is making this call, its safe to request a txid again. After the client gets a txid, they seed it with a REPO. It safe for the client to attempt to seed the FATE operation multiple times in the case of server failures. After the operation is seeded, the client waits for txid to finish. If the server fails while they are waiting, they can wait on the txid again. FATE operations acquire persistent read/write locks in zookeeper on Accumulo tables. These locks are slightly different from the zookeeper lock recipe. First the locks use persistent sequential nodes instead of ephemeral nodes. There is no need to use an ephemeral node because the lock is not related to a process, but rather a persistent FATE op that will continue to execute even if a process dies. The lock data is the FATE txid plus W or R for read or write lock. This type of locking is so much easier to deal with than ephemeral locks where you are never quite sure if the other process is really dead. Keith On Fri, Feb 24, 2012 at 5:24 PM, Mubarak Seyed <[email protected]> wrote: > Hi Dev, > > Can someone please explain how does FATE (Fault tolerant executor) framework > work in Accumulo? > > Thanks, > Mubarak
