Re: [HACKERS] Write Ahead Logging for Hash Indexes

Jeff Janes Sun, 11 Sep 2016 23:00:03 -0700

On Sun, Sep 11, 2016 at 7:40 PM, Amit Kapila <[email protected]>
wrote:


> On Mon, Sep 12, 2016 at 7:00 AM, Jeff Janes <[email protected]> wrote:
> > On Thu, Sep 8, 2016 at 12:09 PM, Jeff Janes <[email protected]>
> wrote:
> >
> >>
> >> I plan to do testing using my own testing harness after changing it to
> >> insert a lot of dummy tuples (ones with negative values in the pseudo-pk
> >> column, which are never queried by the core part of the harness) and
> >> deleting them at random intervals.  I think that none of pgbench's
> built in
> >> tests are likely to give the bucket splitting and squeezing code very
> much
> >> exercise.
> >
> >
> >
> > I've implemented this, by adding lines 197 through 202 to the count.pl
> > script.  (I'm reattaching the test case)
> >
> > Within a few minutes of testing, I start getting Errors like these:
> >
> > 29236 UPDATE XX000 2016-09-11 17:21:25.893 PDT:ERROR:  buffer 2762 is not
> > owned by resource owner Portal
> > 29236 UPDATE XX000 2016-09-11 17:21:25.893 PDT:STATEMENT:  update foo set
> > count=count+1 where index=$1
> >
> >
> > In one test, I also got an error from my test harness itself indicating
> > tuples are transiently missing from the index, starting an hour into a
> test:
> >
> > child abnormal exit update did not update 1 row: key 9555 updated 0E0 at
> > count.pl line 194.\n  at count.pl line 208.
> > child abnormal exit update did not update 1 row: key 8870 updated 0E0 at
> > count.pl line 194.\n  at count.pl line 208.
> > child abnormal exit update did not update 1 row: key 8453 updated 0E0 at
> > count.pl line 194.\n  at count.pl line 208.
> >
> > Those key values should always find exactly one row to update.
> >
> > If the tuples were permanently missing from the index, I would keep
> getting
> > errors on the same key values very frequently.  But I don't get that, the
> > errors remain infrequent and are on different value each time, so I think
> > the tuples are in the index but the scan somehow misses them, either
> while
> > the bucket is being split or while it is being squeezed.
> >
> > This on a build without enable-asserts.
> >
> > Any ideas on how best to go about investigating this?
> >
>
> I think these symptoms indicate the bug in concurrent hash index
> patch, but it could be that the problem can be only revealed with WAL
> patch.  Is it possible to just try this with concurrent hash index
> patch?  In any case, thanks for testing it, I will look into these
> issues.
>

My test program (as posted) injects crashes and then checks the
post-crash-recovery system for consistency, so it cannot be run as-is
without the WAL patch.  I also ran the test with crashing turned off (just
change the JJ* variables at the stop of the do.sh to all be set to the
empty string), and in that case I didn't see either problem, but it it
could just be that I that I didn't run it long enough.

It should have been long enough to detect the rather common "buffer <x> is
not owned by resource owner Portal" problem, so that one I think is
specific to the WAL patch (probably the part which tries to complete bucket
splits when it detects one was started but not completed?)


Cheers,

Jeff

Re: [HACKERS] Write Ahead Logging for Hash Indexes

Reply via email to