I'm moving comments to ACCUMULO-4229 <https://issues.apache.org/jira/browse/ACCUMULO-4229>.
I created two test cases that establish a baseline and reproduce the bug. I will push a fix PR shortly. ~Dylan On Fri, Apr 22, 2016 at 1:16 PM, Dylan Hutchison <[email protected] > wrote: > You caught me Shawn =) In my case, the tserver is also the client. I do > evil things like open BatchWriters inside iterators. > > So if the cache is indeed the problem, then it would only affect clients > that live within the same JVM as the tserver. This is good news because > not every client would see this problem. I'm not sure if this covers > William's MapReduce case. > > I still need to write a test case that reproduces this problem to be > totally sure. > > Cheers, Dylan > > > On Fri, Apr 22, 2016 at 9:31 AM, Shawn Walker <[email protected]> > wrote: > >> Dylan, >> >> You pastebin'd a jstack of the tserver, though from the little I can see, >> I'd think the issue was on the client. Do you have a jstack of what it >> looks like? >> >> You say that "every hour the static collection of TabletLocators clears >> itself", but my read is that this statement only applies to the client >> used >> by the tserver. >> >> -- >> Shawn Walker >> >> On Fri, Apr 22, 2016 at 12:19 AM, Dylan Hutchison < >> [email protected]> wrote: >> >> > Hi all, >> > >> > Check out ACCUMULO-4229 >> > <https://issues.apache.org/jira/browse/ACCUMULO-4229>. I copy the text >> > below to include it in our discussion. >> > >> > This issue came up again during a similar long-lasting BatchWrite. It >> had >> > the same circumstances: the failures started happening after a split >> event. >> > >> > I turned on TRACE logs and I think I pinned it down: the TabletLocator >> > cached by a BatchWriter gets out of sync with the static cache of >> > TabletLocators. >> > >> > 1. The TabletServerBatchWriter caches a TabletLocator from the static >> > collection of TabletLocators when it starts writing. Suppose it is >> > writing >> > to tablet T1. >> > 2. The TabletServerBatchWriter uses its locally cached TabletLocator >> > inside its `binMutations` method for its entire lifespan; this cache >> is >> > never refreshed or updated to sync up with the static collection of >> > TabletLocators. >> > 3. Every hour, the static collection of TabletLocators clears itself. >> > The next call to get a TabletLocator from the static collection >> > allocates a >> > new TabletLocator. Unfortunately, the TabletServerBatchWriter does >> not >> > reflect this change and continues to use the old, locally cached >> > TabletLocator. >> > 4. Tablet T1 splits into T2 and T3, which closes T1. As such, it no >> > longer exists and the tablet server that receives the entries meant >> to >> > go >> > to T1 all fail to write because T1 is closed. >> > 5. The TabletServerBatchWriter receives the response from the tablet >> > server that all entries failed to write. It invalidates the cache of >> > the >> > *new* TabletLocator obtained from the static collection of >> > TabletLocators. >> > The old TabletLocator that is cached locally does not get >> invalidated. >> > 6. The TabletServerBatchWriter re-queues the failed entries and >> tries to >> > write them to the same closed tablet T1, because it is still looking >> up >> > tablets using the old TabletLocator. >> > >> > This behavior subsumes the circumstances William wrote about in the >> thread >> > < >> > >> https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E >> > > >> > he mentioned. The problem would occur as a result of either splits or >> > major compactions. It would only stop the BatchWriter if its entire >> memory >> > filled up with writes to the same tablet that was closed as a result of >> a >> > majc or split; otherwise it would just slow down the BatchWriter by >> failing >> > to write some number of entries with every RPC. >> > >> > There are a few solutions we can think of. >> > >> > 1. Not have the MutationWriter inside the TabletServerBatchWriter >> > locally cache TabletLocators. I suspect this was done for >> performance >> > reasons, so it's probably not a good solution. >> > 2. Have all the MutationWriters clear their cache at the same time >> the >> > static TabletLocator cache clears. I like this one. We could store >> a >> > reference to the Map that each MutationWriter has inside a static >> > synchronized WeakHashMap. The only time the weak map needs to be >> > accessed >> > is: >> > 1. When a MutationWriter is constructed (from constructing a >> > TabletServerBatchWriter), add its new local TabletLocator cache >> > to the weak >> > map. >> > 2. When the static TabletLocator cache is cleared, also clear >> every >> > map in the weak map. >> > 3. Another solution is to make the invalidate calls on the local >> > TabletLocator cache rather than the global static one. If we go this >> > route >> > we should double check the idea to make sure it does not impact the >> > correctness of any other pieces of code that use the cache. I like >> the >> > previous idea better. >> > >> > The TimeoutTabletLocator does not help when no timeout is set on the >> > BatchWriter (the default behavior). >> > >> > >> > On Tue, Apr 19, 2016 at 8:09 PM, William Slacum <[email protected]> >> wrote: >> > >> > > Good digs, Dylan. I don't think it's too rare to matter. I notice >> often >> > > during MR jobs, and there's usually a point where I give up and just >> > start >> > > writing RFiles. >> > > >> > > It could possibly be related to what I saw back in the dayoday with: >> > > >> > > >> > >> https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E >> > > >> > > On Tue, Apr 19, 2016 at 6:26 PM, Josh Elser <[email protected]> >> > wrote: >> > > >> > > > Nice findings. Sorry I haven't had any cycles to dig into this >> myself. >> > > > >> > > > I look forward to hearing what you find :) >> > > > >> > > > >> > > > Dylan Hutchison wrote: >> > > > >> > > >> I investigated a bit more and I am pretty sure the problem is that >> the >> > > >> BatchWriter is not recognizing that the tablet vb<< split into >> > vb;2436< >> > > >> and >> > > >> vb<;2436. It keeps trying to update the closed tablet vb<<. Each >> > > update >> > > >> writes 0 mutations and records a failure at the tablet server >> > > >> UpdateSession >> > > >> because vb<< is closed. >> > > >> >> > > >> I'm not sure why this is happening because the BatchWriter should >> have >> > > >> invalidated its tablet locator cache upon recognizing a failure. >> Then >> > > it >> > > >> would recognize that the entries it wants to write fall into the >> new >> > > >> tablets vb;2436< and vb<;2436. I think there is a timing bug for >> > this >> > > >> edge >> > > >> case, when a table split occurs during heavy writes. >> > > >> >> > > >> I will write this up if I can reproduce it. Maybe it is too rare >> to >> > > >> matter. >> > > >> >> > > >> Cheers, Dylan >> > > >> >> > > >> On Mon, Apr 18, 2016 at 2:38 PM, Dylan Hutchison< >> > > >> [email protected] >> > > >> >> > > >>> wrote: >> > > >>> >> > > >> >> > > >> Hi devs, >> > > >>> >> > > >>> I'd like to ask your help in figuring out what is happening to a >> > > >>> BatchWriter. The following gives my reasoning so far. >> > > >>> >> > > >>> In Accumulo 1.7.1, I have a BatchWriter that is stuck in WAITING >> > status >> > > >>> in >> > > >>> its addMutation method. I saw that it is stuck by jstack'ing the >> > > >>> Accumulo >> > > >>> client. It's been stuck like this for 16 hours. >> > > >>> >> > > >>> The BatchWriter is supposed to wait when a mutation is added if no >> > > >>> failures have recorded and either (a) the total memory used >> exceeds >> > the >> > > >>> maximum allowed for the BatchWriter, or (b) the batchwriter is >> > > currently >> > > >>> flushed. So we conclude that one of (a) or (b) have occurred and >> no >> > > >>> failures were recorded, at the time when addMutation was called. >> I >> > > think >> > > >>> (a) is likely. >> > > >>> >> > > >>> The BatchWriter is supposed to notify itself when either (1) a >> flush >> > > >>> finishes, (2) a constraint violation or authorization failure or >> > server >> > > >>> error or unknown error occurs, (3) memory usage decreases, which >> > > happens >> > > >>> when entries successfully send to the tablet server. Since the >> > > >>> BatchWriter >> > > >>> is stuck on WAITING, none of these conditions are occurring. >> > > >>> >> > > >>> The BatchWriter has 3 write threads (the default number). All >> three >> > > have >> > > >>> status TIMED_WAITING (parked) in jstack. Their stack traces don't >> > give >> > > >>> useful information. >> > > >>> >> > > >>> Here's what I can tell from the tserver logs. A new table (and >> > tablet) >> > > >>> was created successfully. The BatchWriter started writing to this >> > > tablet >> > > >>> steadily. The logs show that the tablet (vb<<) flushed every 5 >> > seconds >> > > >>> or >> > > >>> so and major compacted at a steady periodic rate. >> > > >>> >> > > >>> Everything looks good, until vb<< grew large enough that it >> needed >> > > >>> splitting. This occurred about 42 minutes after the BatchWriter >> > > started >> > > >>> writing entries. The logs show a failure in an UpdateSession that >> > > popped >> > > >>> up in the middle of the split operation. This failure continues >> to >> > > show >> > > >>> for the next 15 hours. >> > > >>> >> > > >>> I copied the portion of the tserver logs that look relevant to the >> > > split >> > > >>> below. I highlighted the line reporting the first failure. It >> > occurs >> > > in >> > > >>> between when the split starts and when it finishes. >> > > >>> >> > > >>> Any idea what could have caused this? I don't know if the >> failure is >> > > >>> related to the BatchWriter being stuck in WAITING. It seems >> > likely. I >> > > >>> think it is weird that the 3 write threads are all idle; at least >> one >> > > of >> > > >>> them should be doing something if the thread calling >> addMutation() is >> > > >>> waiting. >> > > >>> >> > > >>> Here is a pastebin of the jstack<http://pastebin.com/TLfdrDwX>, >> > > though I >> > > >>> think I wrote the useful parts from it. >> > > >>> >> > > >>> 2016-04-17 22:38:06,436 [tablet.Tablet] TABLET_HIST: vb<< closed >> > > >>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for low split >> > > >>> vb;2436< >> > > >>> >> > [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf, >> > > >>> hdfs: >> > > >>> //localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf, >> > > >>> hdfs://localhost:90 >> > > >>> 00/accumulo/tables/vb/default_tablet/M0000a9g.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf] >> > > >>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for high >> split >> > > >>> vb<;2436 >> > > >>> >> > [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf, >> > > hdfs >> > > >>> ://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf, >> > > >>> hdfs://localhost:9 >> > > >>> 000/accumulo/tables/vb/default_tablet/M0000a9g.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf, >> > > >>> >> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf] >> > > >>> 2016-04-17 22:38:06,440 [tserver.TabletServer] DEBUG: ScanSess tid >> > > >>> 127.0.0.1:42302 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] >> > > >>> 2016-04-17 22:38:06,472 [tablet.Tablet] TABLET_HIST: vb<< split >> > > vb;2436< >> > > >>> vb<;2436 >> > > >>> 2016-04-17 22:38:06,472 [tablet.Tablet] DEBUG: offline split time >> : >> > > >>> 3.22 >> > > >>> secs >> > > >>> *2016-04-17 22:38:06,472 [tserver.TabletServer] INFO : Starting >> > split: >> > > >>> vb<<* >> > > >>> 2016-04-17 22:38:06,472 [tserver.TabletServer] DEBUG: UpSess >> > > >>> 127.0.0.1:41090 0 in 3.059s, at=[0 0 0.00 1] ft=3.010s(pt=3.010s >> > > >>> lt=0.000s ct=0.000s) >> > > >>> 2016-04-17 22:38:06,472 [tserver.NativeMap] DEBUG: Allocated >> native >> > map >> > > >>> 0x0000000009a20e60 >> > > >>> >> > > >>> *2016-04-17 22:38:06,473 [tserver.TabletServer] DEBUG: Failures: >> 1, >> > > first >> > > >>> extent vb<< successful commits: 0* >> > > >>> >> > > >>> 2016-04-17 22:38:06,473 [tablet.Tablet] DEBUG: Reloading >> constraints >> > > for >> > > >>> extent: vb;2436< >> > > >>> 2016-04-17 22:38:06,473 [constraints.ConstraintChecker] DEBUG: >> Loaded >> > > >>> constraint >> > > org.apache.accumulo.core.constraints.DefaultKeySizeConstraint >> > > >>> for vb >> > > >>> 2016-04-17 22:38:06,474 [tablet.Tablet] TABLET_HIST: vb;2436< >> opened >> > > >>> 2016-04-17 22:38:06,474 [tserver.NativeMap] DEBUG: Allocated >> native >> > map >> > > >>> 0x0000000003edf090 >> > > >>> 2016-04-17 22:38:06,474 [tablet.Tablet] DEBUG: Reloading >> constraints >> > > for >> > > >>> extent: vb<;2436 >> > > >>> 2016-04-17 22:38:06,474 [constraints.ConstraintChecker] DEBUG: >> Loaded >> > > >>> constraint >> > > org.apache.accumulo.core.constraints.DefaultKeySizeConstraint >> > > >>> for vb >> > > >>> 2016-04-17 22:38:06,475 [tablet.Tablet] TABLET_HIST: vb<;2436 >> opened >> > > >>> *2016-04-17 22:38:06,475 [tserver.TabletServer] INFO : Tablet >> split: >> > > vb<< >> > > >>> size0 539900144 size1 539900148 time 8268ms* >> > > >>> 2016-04-17 22:38:06,594 [tserver.TabletServer] DEBUG: >> MultiScanSess >> > > >>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs >> > > tablets:1 >> > > >>> ranges:1) >> > > >>> 2016-04-17 22:38:06,597 [tserver.TabletServer] DEBUG: >> MultiScanSess >> > > >>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs >> > > tablets:1 >> > > >>> ranges:1) >> > > >>> 2016-04-17 22:38:07,568 [tserver.TabletServer] DEBUG: ScanSess tid >> > > >>> 127.0.0.1:42307 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] >> > > >>> 2016-04-17 22:38:07,571 [zookeeper.DistributedWorkQueue] DEBUG: >> > Looking >> > > >>> for work in >> > > >>> /accumulo/c7998e9d-129d-4c61-9732-45bb6bcab3f0/bulk_failed_copyq >> > > >>> 2016-04-17 22:38:07,940 [tserver.TabletServer] DEBUG: UpSess >> > > >>> 127.0.0.1:41090 0 in 0.067s, at=[1 1 1.00 1] ft=0.000s(pt=0.000s >> > > >>> lt=0.000s ct=0.000s) >> > > >>> 2016-04-17 22:38:07,941 [tserver.TabletServer] DEBUG: Failures: 1, >> > > first >> > > >>> extent vb<< successful commits: 0 >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >> >> > > >> > >> > >
