Re: Checking what a BatchWriter is stuck on; failure during split

Dylan Hutchison Sat, 23 Apr 2016 01:42:45 -0700

I'm moving comments to ACCUMULO-4229
<https://issues.apache.org/jira/browse/ACCUMULO-4229>.


I created two test cases that establish a baseline and reproduce the bug.
I will push a fix PR shortly.

~Dylan

On Fri, Apr 22, 2016 at 1:16 PM, Dylan Hutchison <[email protected]
> wrote:

> You caught me Shawn =)  In my case, the tserver is also the client.  I do
> evil things like open BatchWriters inside iterators.
>
> So if the cache is indeed the problem, then it would only affect clients
> that live within the same JVM as the tserver.  This is good news because
> not every client would see this problem.  I'm not sure if this covers
> William's MapReduce case.
>
> I still need to write a test case that reproduces this problem to be
> totally sure.
>
> Cheers, Dylan
>
>
> On Fri, Apr 22, 2016 at 9:31 AM, Shawn Walker <[email protected]>
> wrote:
>
>> Dylan,
>>
>> You pastebin'd a jstack of the tserver, though from the little I can see,
>> I'd think the issue was on the client.  Do you have a jstack of what it
>> looks like?
>>
>> You say that "every hour the static collection of TabletLocators clears
>> itself", but my read is that this statement only applies to the client
>> used
>> by the tserver.
>>
>> --
>> Shawn Walker
>>
>> On Fri, Apr 22, 2016 at 12:19 AM, Dylan Hutchison <
>> [email protected]> wrote:
>>
>> > Hi all,
>> >
>> > Check out ACCUMULO-4229
>> > <https://issues.apache.org/jira/browse/ACCUMULO-4229>.  I copy the text
>> > below to include it in our discussion.
>> >
>> > This issue came up again during a similar long-lasting BatchWrite.  It
>> had
>> > the same circumstances: the failures started happening after a split
>> event.
>> >
>> > I turned on TRACE logs and I think I pinned it down: the TabletLocator
>> > cached by a BatchWriter gets out of sync with the static cache of
>> > TabletLocators.
>> >
>> >    1. The TabletServerBatchWriter caches a TabletLocator from the static
>> >    collection of TabletLocators when it starts writing.  Suppose it is
>> > writing
>> >    to tablet T1.
>> >    2. The TabletServerBatchWriter uses its locally cached TabletLocator
>> >    inside its `binMutations` method for its entire lifespan; this cache
>> is
>> >    never refreshed or updated to sync up with the static collection of
>> >    TabletLocators.
>> >    3. Every hour, the static collection of TabletLocators clears itself.
>> >    The next call to get a TabletLocator from the static collection
>> > allocates a
>> >    new TabletLocator.  Unfortunately, the TabletServerBatchWriter does
>> not
>> >    reflect this change and continues to use the old, locally cached
>> >    TabletLocator.
>> >    4. Tablet T1 splits into T2 and T3, which closes T1.  As such, it no
>> >    longer exists and the tablet server that receives the entries meant
>> to
>> > go
>> >    to T1 all fail to write because T1 is closed.
>> >    5. The TabletServerBatchWriter receives the response from the tablet
>> >    server that all entries failed to write.  It invalidates the cache of
>> > the
>> >    *new* TabletLocator obtained from the static collection of
>> > TabletLocators.
>> >    The old TabletLocator that is cached locally does not get
>> invalidated.
>> >    6. The TabletServerBatchWriter re-queues the failed entries and
>> tries to
>> >    write them to the same closed tablet T1, because it is still looking
>> up
>> >    tablets using the old TabletLocator.
>> >
>> > This behavior subsumes the circumstances William wrote about in the
>> thread
>> > <
>> >
>> https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E
>> > >
>> > he mentioned.  The problem would occur as a result of either splits or
>> > major compactions.  It would only stop the BatchWriter if its entire
>> memory
>> > filled up with writes to the same tablet that was closed as a result of
>> a
>> > majc or split; otherwise it would just slow down the BatchWriter by
>> failing
>> > to write some number of entries with every RPC.
>> >
>> > There are a few solutions we can think of.
>> >
>> >    1. Not have the MutationWriter inside the TabletServerBatchWriter
>> >    locally cache TabletLocators.  I suspect this was done for
>> performance
>> >    reasons, so it's probably not a good solution.
>> >    2. Have all the MutationWriters clear their cache at the same time
>> the
>> >    static TabletLocator cache clears.  I like this one.  We could store
>> a
>> >    reference to the Map that each MutationWriter has inside a static
>> >    synchronized WeakHashMap.  The only time the weak map needs to be
>> > accessed
>> >    is:
>> >       1. When a MutationWriter is constructed (from constructing a
>> >       TabletServerBatchWriter), add its new local TabletLocator cache
>> > to the weak
>> >       map.
>> >       2. When the static TabletLocator cache is cleared, also clear
>> every
>> >       map in the weak map.
>> >    3. Another solution is to make the invalidate calls on the local
>> >    TabletLocator cache rather than the global static one.  If we go this
>> > route
>> >    we should double check the idea to make sure it does not impact the
>> >    correctness of any other pieces of code that use the cache. I like
>> the
>> >    previous idea better.
>> >
>> > The TimeoutTabletLocator does not help when no timeout is set on the
>> > BatchWriter (the default behavior).
>> >
>> >
>> > On Tue, Apr 19, 2016 at 8:09 PM, William Slacum <[email protected]>
>> wrote:
>> >
>> > > Good digs, Dylan. I don't think it's too rare to matter. I notice
>> often
>> > > during MR jobs, and there's usually a point where I give up and just
>> > start
>> > > writing RFiles.
>> > >
>> > > It could possibly be related to what I saw back in the dayoday with:
>> > >
>> > >
>> >
>> https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E
>> > >
>> > > On Tue, Apr 19, 2016 at 6:26 PM, Josh Elser <[email protected]>
>> > wrote:
>> > >
>> > > > Nice findings. Sorry I haven't had any cycles to dig into this
>> myself.
>> > > >
>> > > > I look forward to hearing what you find :)
>> > > >
>> > > >
>> > > > Dylan Hutchison wrote:
>> > > >
>> > > >> I investigated a bit more and I am pretty sure the problem is that
>> the
>> > > >> BatchWriter is not recognizing that the tablet vb<<  split into
>> > vb;2436<
>> > > >> and
>> > > >> vb<;2436.  It keeps trying to update the closed tablet vb<<.  Each
>> > > update
>> > > >> writes 0 mutations and records a failure at the tablet server
>> > > >> UpdateSession
>> > > >> because vb<<  is closed.
>> > > >>
>> > > >> I'm not sure why this is happening because the BatchWriter should
>> have
>> > > >> invalidated its tablet locator cache upon recognizing a failure.
>> Then
>> > > it
>> > > >> would recognize that the entries it wants to write fall into the
>> new
>> > > >> tablets vb;2436<  and vb<;2436.  I think there is a timing bug for
>> > this
>> > > >> edge
>> > > >> case, when a table split occurs during heavy writes.
>> > > >>
>> > > >> I will write this up if I can reproduce it.  Maybe it is too rare
>> to
>> > > >> matter.
>> > > >>
>> > > >> Cheers, Dylan
>> > > >>
>> > > >> On Mon, Apr 18, 2016 at 2:38 PM, Dylan Hutchison<
>> > > >> [email protected]
>> > > >>
>> > > >>> wrote:
>> > > >>>
>> > > >>
>> > > >> Hi devs,
>> > > >>>
>> > > >>> I'd like to ask your help in figuring out what is happening to a
>> > > >>> BatchWriter.  The following gives my reasoning so far.
>> > > >>>
>> > > >>> In Accumulo 1.7.1, I have a BatchWriter that is stuck in WAITING
>> > status
>> > > >>> in
>> > > >>> its addMutation method.  I saw that it is stuck by jstack'ing the
>> > > >>> Accumulo
>> > > >>> client.  It's been stuck like this for 16 hours.
>> > > >>>
>> > > >>> The BatchWriter is supposed to wait when a mutation is added if no
>> > > >>> failures have recorded and either (a) the total memory used
>> exceeds
>> > the
>> > > >>> maximum allowed for the BatchWriter, or (b) the batchwriter is
>> > > currently
>> > > >>> flushed.  So we conclude that one of (a) or (b) have occurred and
>> no
>> > > >>> failures were recorded, at the time when addMutation was called.
>> I
>> > > think
>> > > >>> (a) is likely.
>> > > >>>
>> > > >>> The BatchWriter is supposed to notify itself when either (1) a
>> flush
>> > > >>> finishes, (2) a constraint violation or authorization failure or
>> > server
>> > > >>> error or unknown error occurs, (3) memory usage decreases, which
>> > > happens
>> > > >>> when entries successfully send to the tablet server.  Since the
>> > > >>> BatchWriter
>> > > >>> is stuck on WAITING, none of these conditions are occurring.
>> > > >>>
>> > > >>> The BatchWriter has 3 write threads (the default number).  All
>> three
>> > > have
>> > > >>> status TIMED_WAITING (parked) in jstack.  Their stack traces don't
>> > give
>> > > >>> useful information.
>> > > >>>
>> > > >>> Here's what I can tell from the tserver logs.  A new table (and
>> > tablet)
>> > > >>> was created successfully.  The BatchWriter started writing to this
>> > > tablet
>> > > >>> steadily.  The logs show that the tablet (vb<<) flushed every 5
>> > seconds
>> > > >>> or
>> > > >>> so and major compacted at a steady periodic rate.
>> > > >>>
>> > > >>> Everything looks good, until vb<<  grew large enough that it
>> needed
>> > > >>> splitting.  This occurred about 42 minutes after the BatchWriter
>> > > started
>> > > >>> writing entries.  The logs show a failure in an UpdateSession that
>> > > popped
>> > > >>> up in the middle of the split operation.  This failure continues
>> to
>> > > show
>> > > >>> for the next 15 hours.
>> > > >>>
>> > > >>> I copied the portion of the tserver logs that look relevant to the
>> > > split
>> > > >>> below.  I highlighted the line reporting the first failure.  It
>> > occurs
>> > > in
>> > > >>> between when the split starts and when it finishes.
>> > > >>>
>> > > >>> Any idea what could have caused this?  I don't know if the
>> failure is
>> > > >>> related to the BatchWriter being stuck in WAITING.  It seems
>> > likely.  I
>> > > >>> think it is weird that the 3 write threads are all idle; at least
>> one
>> > > of
>> > > >>> them should be doing something if the thread calling
>> addMutation() is
>> > > >>> waiting.
>> > > >>>
>> > > >>> Here is a pastebin of the jstack<http://pastebin.com/TLfdrDwX>,
>> > > though I
>> > > >>> think I wrote the useful parts from it.
>> > > >>>
>> > > >>> 2016-04-17 22:38:06,436 [tablet.Tablet] TABLET_HIST: vb<<  closed
>> > > >>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for low split
>> > > >>> vb;2436<
>> > > >>>
>> >  [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,
>> > > >>> hdfs:
>> > > >>> //localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
>> > > >>> hdfs://localhost:90
>> > > >>> 00/accumulo/tables/vb/default_tablet/M0000a9g.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
>> > > >>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for high
>> split
>> > > >>> vb<;2436
>> > > >>>
>> >  [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,
>> > > hdfs
>> > > >>> ://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
>> > > >>> hdfs://localhost:9
>> > > >>> 000/accumulo/tables/vb/default_tablet/M0000a9g.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
>> > > >>>
>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
>> > > >>> 2016-04-17 22:38:06,440 [tserver.TabletServer] DEBUG: ScanSess tid
>> > > >>> 127.0.0.1:42302 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1]
>> > > >>> 2016-04-17 22:38:06,472 [tablet.Tablet] TABLET_HIST: vb<<  split
>> > > vb;2436<
>> > > >>> vb<;2436
>> > > >>> 2016-04-17 22:38:06,472 [tablet.Tablet] DEBUG: offline split time
>> :
>> > > >>>  3.22
>> > > >>> secs
>> > > >>> *2016-04-17 22:38:06,472 [tserver.TabletServer] INFO : Starting
>> > split:
>> > > >>> vb<<*
>> > > >>> 2016-04-17 22:38:06,472 [tserver.TabletServer] DEBUG: UpSess
>> > > >>> 127.0.0.1:41090 0 in 3.059s, at=[0 0 0.00 1] ft=3.010s(pt=3.010s
>> > > >>> lt=0.000s ct=0.000s)
>> > > >>> 2016-04-17 22:38:06,472 [tserver.NativeMap] DEBUG: Allocated
>> native
>> > map
>> > > >>> 0x0000000009a20e60
>> > > >>>
>> > > >>> *2016-04-17 22:38:06,473 [tserver.TabletServer] DEBUG: Failures:
>> 1,
>> > > first
>> > > >>> extent vb<<  successful commits: 0*
>> > > >>>
>> > > >>> 2016-04-17 22:38:06,473 [tablet.Tablet] DEBUG: Reloading
>> constraints
>> > > for
>> > > >>> extent: vb;2436<
>> > > >>> 2016-04-17 22:38:06,473 [constraints.ConstraintChecker] DEBUG:
>> Loaded
>> > > >>> constraint
>> > > org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>> > > >>> for vb
>> > > >>> 2016-04-17 22:38:06,474 [tablet.Tablet] TABLET_HIST: vb;2436<
>> opened
>> > > >>> 2016-04-17 22:38:06,474 [tserver.NativeMap] DEBUG: Allocated
>> native
>> > map
>> > > >>> 0x0000000003edf090
>> > > >>> 2016-04-17 22:38:06,474 [tablet.Tablet] DEBUG: Reloading
>> constraints
>> > > for
>> > > >>> extent: vb<;2436
>> > > >>> 2016-04-17 22:38:06,474 [constraints.ConstraintChecker] DEBUG:
>> Loaded
>> > > >>> constraint
>> > > org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>> > > >>> for vb
>> > > >>> 2016-04-17 22:38:06,475 [tablet.Tablet] TABLET_HIST: vb<;2436
>> opened
>> > > >>> *2016-04-17 22:38:06,475 [tserver.TabletServer] INFO : Tablet
>> split:
>> > > vb<<
>> > > >>> size0 539900144 size1 539900148 time 8268ms*
>> > > >>> 2016-04-17 22:38:06,594 [tserver.TabletServer] DEBUG:
>> MultiScanSess
>> > > >>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs
>> > > tablets:1
>> > > >>> ranges:1)
>> > > >>> 2016-04-17 22:38:06,597 [tserver.TabletServer] DEBUG:
>> MultiScanSess
>> > > >>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs
>> > > tablets:1
>> > > >>> ranges:1)
>> > > >>> 2016-04-17 22:38:07,568 [tserver.TabletServer] DEBUG: ScanSess tid
>> > > >>> 127.0.0.1:42307 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1]
>> > > >>> 2016-04-17 22:38:07,571 [zookeeper.DistributedWorkQueue] DEBUG:
>> > Looking
>> > > >>> for work in
>> > > >>> /accumulo/c7998e9d-129d-4c61-9732-45bb6bcab3f0/bulk_failed_copyq
>> > > >>> 2016-04-17 22:38:07,940 [tserver.TabletServer] DEBUG: UpSess
>> > > >>> 127.0.0.1:41090 0 in 0.067s, at=[1 1 1.00 1] ft=0.000s(pt=0.000s
>> > > >>> lt=0.000s ct=0.000s)
>> > > >>> 2016-04-17 22:38:07,941 [tserver.TabletServer] DEBUG: Failures: 1,
>> > > first
>> > > >>> extent vb<<  successful commits: 0
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>
>> > >
>> >
>>
>
>

Re: Checking what a BatchWriter is stuck on; failure during split

Reply via email to