One other thing. The Accumulo master runs FATE. If the master dies, when it starts again it will start FATE. FATE will resume executing transactions. A master could die and restart multiple times on different machines during a create table operation and the client requesting the table creation would never know there was a problem.
On Fri, Feb 24, 2012 at 5:55 PM, Keith Turner <[email protected]> wrote: > These slides have some info. > > http://people.apache.org/~kturner/accumulo14_15.pdf > > FATE makes it very easy to write fault tolerant multi-step Accumulo > operations, like create table and delete table. Create table has to > modify zookeeper and the Accumulo metadata table. Its many steps. If > there is a machine failure at any step, it could leave Accumulo in an > inconsistent state. An operation like create table is broken into > lots of little FATE operations. > > FATE operations are called Repo's (repeatable persisted operation). > These operation must be written in such a way that its safe to call > them one or more times. A repo is submitted to FATE. FATE pushes the > repo into zookeeper before executing it. Then FATE executes the repo, > if the repo returns another repo it pushes that into zookeeper and > then executes it. If a repo does not return another repo, it > consideres the FATE transaction done. If a repo throws an exeception, > then it pops all of the repo operation for the transaction out of > zookeeper calling undo on each one. A repo has an isReady method, > this is provided so a FATE operation can wait for a condition on the > cluster to become true w/o waiting tying up a thread. If a repo is > not ready then it will not be executed, later it will be read from > Zookeeper and isReady called again. > > Clients submit FATE operations by first requesting a FATE transaction > id (txid). If the server fails while the client is making this call, > its safe to request a txid again. After the client gets a txid, they > seed it with a REPO. It safe for the client to attempt to seed the > FATE operation multiple times in the case of server failures. After > the operation is seeded, the client waits for txid to finish. If the > server fails while they are waiting, they can wait on the txid again. > > FATE operations acquire persistent read/write locks in zookeeper on > Accumulo tables. These locks are slightly different from the > zookeeper lock recipe. First the locks use persistent sequential > nodes instead of ephemeral nodes. There is no need to use an > ephemeral node because the lock is not related to a process, but > rather a persistent FATE op that will continue to execute even if a > process dies. The lock data is the FATE txid plus W or R for read or > write lock. This type of locking is so much easier to deal with than > ephemeral locks where you are never quite sure if the other process is > really dead. > > Keith > > > On Fri, Feb 24, 2012 at 5:24 PM, Mubarak Seyed <[email protected]> wrote: >> Hi Dev, >> >> Can someone please explain how does FATE (Fault tolerant executor) framework >> work in Accumulo? >> >> Thanks, >> Mubarak
