So, to clarify Keith's answer, I think the answer is actually no. Regulare Olde Scanners only provide a row-level snapshot isolation view if you call Scanner.enableIsolation(), but then they can thrown an exception only if there is a tablet server failure.
Adam On Wed, Dec 21, 2011 at 2:05 PM, Keith Turner <[email protected]> wrote: > Yes. Regular scanners do provide a consistent state when there are no > failures, if you call enableIsolation(). > > In the case of tablet server failures you can use the IsolatedScanner > or handle the IsolationException yourself. If you need to handle rows > that do not fit in memory, then you can pass a user defined buffer to > the IsolatedScanner. This user defined buffer could buffer to disk. > The default buffer just buffers rows in memory. > > BTW there is also a simple isolation example. > > On Wed, Dec 21, 2011 at 1:57 PM, Aaron Cordova <[email protected]> wrote: > > OK - thanks for the update. > > > > So, just to see if I understand - except in cases of failure, regular > ole Scanners will provide a consistent view of atomic mutations, and if > consistent rows are required in the presence of failures then one should > use the IsolatedScanner which includes restart semantics upon detection of > a failure that could threaten row consistency? > > > > On Dec 21, 2011, at 12:24 PM, Adam Fuchs wrote: > > > >> We have a bunch of rows that don't fit into memory when using some of > the > >> table design patterns we like to use on Accumulo. Having row-level > >> isolation without requiring rows to fit in memory was important to us. > >> However, this is not trivial, especially under failures. > >> > >> The basic technique we use involves keeping a mutation counter for all > >> active scans on a tablet, writing the mutation counter with entries in > the > >> in-memory map, and keeping all of the data we need to provide a snapshot > >> isolation view for the existing scans. The tricky part here is that if a > >> tablet server fails then the recovery of a tablet on another tablet > server > >> doesn't include a recovery of the list of active scans. The tablet > server > >> might decide to minor compact, and the data needed to provide the > row-level > >> snapshot-isolation view might be lost when the entries flow through the > >> iterator tree. > >> > >> We allow for many ways of dealing with this isolation fault. The Scanner > >> ignores it by default. Users can also turn on the isolation exception > via > >> Scanner.enableIsolation(), resulting in the possibility of an > >> IsolationException (subclass of RuntimeException) being thrown by the > >> ScannerIterator. The IsolatedScanner wraps a Scanner, enables isolation > on > >> that scanner, buffers rows on the client side (possibly on disk), and > can > >> handle the IsolationException by restarting at the beginning of a row. > >> Handling isolation without buffering is also possible by using a > checkpoint > >> and restart design that propagates through the application code, so we > >> wanted to support that behavior by letting applications handle the > >> exception in their own way. > >> > >> Sorry about the lack of documentation! We'll get working on it. > >> > >> Adam > >> > >> > >> On Wed, Dec 21, 2011 at 11:45 AM, Aaron Cordova <[email protected]> > wrote: > >> > >>> I'm looking over the IsolatedScanner and wondering, since you've all > >>> probably thought more about it than I, whether loading a row entirely > into > >>> memory is required to provide row isolation, or whether it simply > makes it > >>> easier to implement. > >>> > >>> The BigTable paper says it makes the rows in the memtable > copy-on-write. > >>> Does this imply copying the entire row into memory first? That would > seem > >>> to make read-modify-write operations simpler, but it doesn't seem a > >>> necessary condition for just writes ... > >>> > >>> In the future, is the intention to provide row-isolation upon request > (via > >>> using the IsolatedScanner), thereby making non-atomic reads (via the > >>> Scanner) the default? > >>> > >>> Aaron > > >
