Made an inline comment below in an attempt to improve one of my comments. On Wed, Jul 12, 2023 at 12:31 PM Keith Turner <ke...@deenlo.com> wrote:
> Responses inline below. > > On Thu, Jul 6, 2023 at 7:02 AM Logan Jones <lo...@codescratch.com> wrote: > >> Hi Keith: >> >> Thanks so much for the response. We will base things off the >> RowEncodingIterator then. >> >> A few follow up questions out of curiosity: >> >> >> 1. It is likely that my iterator will not return very many records >> because I"m hoping we don't have much invalid data. Should I be worried >> about the fact that it's not going to return much data? I guess I >> should >> > > When the in memory data is flushed/minor compacted if there are long > running scans, Accumulo may copy the in memory data verbatim to a tmp file > and transparently switch the scan to that in lieu of the in memory > snapshot. The scan can not be switched to the minor compacted files > because iterators may have run on it and the snapshot behavior could not be > maintained. > > > expect long running scans. And then, if a t-server dies, it just won't >> know >> where to pick up from so entries might get re-scanned? >> > > Correct, any progress would be lost. > > >> 2. What are the criteria Accumulo uses to decide it's time to re-build >> an entire iterator stack? Feel free to point me at code and I can read >> it >> from there. >> > > There are three conditions I can think of. > > One condition is that Accumulo places a SourceSwitching[1] iterator at the > lowest levels of the iterator stack which uses the ScanDataSource[2] to > determine when to switch. That in turn uses an atomic counter[3] that is > incremented when files or the in memory map changes to determine if a > switch is needed[4]. > I did not describe that very well, the terms lowest/highest/top/bottom/etc could be ambiguous. When a tserver reads data is has an iterator stack that looks like this "sourceSwitchingIter(userIters(systemIters(dataSources())))" in terms of wrapping. Data is read from the outer sourceSwitchingIter. The outer sourceSwitchingIter could possibly rebuild the inner "userIters(systemIters(dataSources()))" that it wraps after they return a key value. > > Another condition is that Accumulo buffers scan data for scan[5] and batch > scan[6] and when the buffer fills up it will send the batch of key values > back to the client. When the client gets the batch and requests another > batch that will create a new iterator stack, unless it's an isolated > scan[7]. When getting the next batch I think Accumulo will create a new > Range where the first key is the last key seen non inclusive. > > Another condition is when tserver dies mid scan. > > [1]: > https://github.com/apache/accumulo/blob/d4846d407e5b28482394e2c0baa16932ae35e086/core/src/main/java/org/apache/accumulo/core/iteratorsImpl/system/SourceSwitchingIterator.java#L45 > [2]: > https://github.com/apache/accumulo/blob/d4846d407e5b28482394e2c0baa16932ae35e086/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java#L56 > [3]: > https://github.com/apache/accumulo/blob/d4846d407e5b28482394e2c0baa16932ae35e086/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java#L158 > [4]: > https://github.com/apache/accumulo/blob/d4846d407e5b28482394e2c0baa16932ae35e086/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java#L113-L115 > [5]: > https://github.com/apache/accumulo/blob/d4846d407e5b28482394e2c0baa16932ae35e086/server/tserver/src/main/java/org/apache/accumulo/tserver/scan/NextBatchTask.java#L78 > [6]: > https://github.com/apache/accumulo/blob/d4846d407e5b28482394e2c0baa16932ae35e086/server/tserver/src/main/java/org/apache/accumulo/tserver/scan/LookupTask.java#L77 > [7]: > https://github.com/apache/accumulo/blob/d4846d407e5b28482394e2c0baa16932ae35e086/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Scanner.java#L101-L110 > > > >> >> Thanks, >> >> - Logan >> >> On Wed, Jul 5, 2023 at 7:13 PM Keith Turner <ke...@deenlo.com> wrote: >> >> > There are two options for this. One is to buffer the row in memory and >> > encode it in your iterator like the whole row iterator does. The other >> is >> > to use the isolated scanner[1][2], but this does not work for batch >> scans. >> > >> > Accumulo should not tear iterators down until after they return >> something, >> > this is the behavior that the wholerowiterator relies on. So if your >> > iterator reads the entire row from its source iterator without returning >> > anything then Accumulo will not do anything to the iterator or its data >> > sources. An iterators data sources are the files and optionally a >> snapshot >> > of the in memory map. After the top level iterator has returned a key >> > value, its possible that Accumulo could rebuild the iterator stack with >> new >> > data sources (like new files that arrived or a new snapshot of the in >> > memory map). This means you can use the trick of having the top level >> > iterator not return anything until a row boundary is seen. >> > >> > For isolated scans Accumulo will only tear down iterators and use new >> data >> > sources on row boundaries. Enabling isolation on scanner[2] will cause >> > the scanner to throw an isolation exception if a tablet server dies >> while >> > the client scanner is in the middle of reading a row. The >> > IsolatedScanner[3] wraps a scanner and hides the isolation exception by >> > buffering rows and rereading them when an isolation exception occurs, >> > making it easy to use isolated scans. >> > >> > The wholerowiterator handles a tablet server dying or data source >> changing >> > well because it encodes the entire row as a single key value, so if the >> > client gets it then it will not request that row again. >> > >> > [1]: >> > >> > >> https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/Scanner.html#enableIsolation() >> > [2]: >> > >> > >> https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/IsolatedScanner.html >> > >> > >> > >> > On Wed, Jul 5, 2023 at 10:53 AM Logan Jones <lo...@codescratch.com> >> wrote: >> > >> > > Hello Mailing List: >> > > >> > > I have an iterator that will scan an entire table and only return keys >> > that >> > > match these criteria: >> > > >> > > 1. If a specific CF is "invalid" according to some criteria >> > > 2. If a specific CF is missing on a row >> > > 3. If there are multiple entries for a specific CF >> > > >> > > #1 would be easy to accomplish with a Filter, however #2 and #3 have >> > proven >> > > to be more tricky. As I understand the problem, Accumulo can, at any >> > point, >> > > destroy an iterator and re-call init. I am keeping some internal state >> > > related to a row (namely a count of how many times I've seen that >> > specific >> > > CF). >> > > >> > > How can I keep the state I need for an entire row? >> > > >> > > I've looked at the RowEncodingIterator along with the >> WholeRowIterator, >> > but >> > > based on my understanding, it feels like Accumulo should be allowed to >> > > destroy their state at any time and cause them to effectively break. >> Is >> > > there a guarantee that an iterator won't get destroyed mid row? >> > > >> > > Thanks, >> > > >> > > - Logan >> > > >> > >> >