On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinn...@iki.fi>
> wrote:
>
>>
>> Both. Here's the patch.
>>
>> Previously, LWLockAcquireWithVar set the variable associated with the
>> lock atomically with acquiring it. Before the lwlock-scalability changes,
>> that was straightforward because you held the spinlock anyway, but it's a
>> lot harder/expensive now. So I changed the way acquiring a lock with a
>> variable works. There is now a separate flag, LW_FLAG_VAR_SET, which
>> indicates that the current lock holder has updated the variable. The
>> LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(),
>> which always clears the LW_FLAG_VAR_SET flag, and you can call
>> LWLockUpdateVar() after that if you want to set the variable immediately.
>> LWLockWaitForVar() always waits if the flag is not set, i.e. it will not
>> return regardless of the variable's value, if the current lock-holder has
>> not updated it yet.
>>
>>
> I ran this for a while without casserts and it seems to work.  But with
> casserts, I get failures in the autovac process on the GIN index.
>
> I don't see how this is related to the LWLock issue, but I didn't see it
> without your patch.  Perhaps the system just didn't survive long enough to
> uncover it without the patch (although it shows up pretty quickly).  It
> could just be an overzealous Assert, since the casserts off didn't show
> problems.
>

> bt and bt full are shown below.
>
> Cheers,
>
> Jeff
>
> #0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
> #1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
> #2  0x0000000000930b7a in ExceptionalCondition (
>     conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
> (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
> "FailedAssertion",
>     fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
> #3  0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
> ginvacuum.c:713
>

It now looks like this *is* unrelated to the LWLock issue.  The assert that
it is tripping over was added just recently (302ac7f27197855afa8c) and so I
had not been testing under its presence until now.  It looks like it is
finding all-zero pages (index extended but then a crash before initializing
the page?) and it doesn't like them.

(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' <repeats 8191 times>

Presumably before this assert, such pages would just be permanently
orphaned.

Cheers,

Jeff

Reply via email to