> On Sep 10, 2021, at 7:36 AM, Amul Sul <sula...@gmail.com> wrote:
> 
>> v33-0005
>> 
>> This patch makes bool XLogInsertAllowed() more complicated than before.  The 
>> result used to depend mostly on the value of LocalXLogInsertAllowed except 
>> that when that value was negative, the result was determined by 
>> RecoveryInProgress(). There was an arcane rule that LocalXLogInsertAllowed 
>> must have the non-negative values binary coercible to boolean "true" and 
>> "false", with the basis for that rule being the coding of 
>> XLogInsertAllowed().  Now that the function is more complicated, this rule 
>> seems even more arcane.  Can we change the logic to not depend on casting an 
>> integer to bool?
>> 
> 
> We can't use a boolean variable because LocalXLogInsertAllowed
> represents three states as, 1 means "wal is allowed'', 0 for "wal is
> disallowed", and -1 is for "need to check".

I'm complaining that we're using an integer rather than an enum.  I'm ok if we 
define it so that WAL_ALLOWABLE_UNKNOWN = -1, WAL_DISALLOWED = 0, WAL_ALLOWED = 
1 or such, but the logic of the function has gotten complicated enough that 
having to remember which number represents which logical condition has become a 
(small) mental burden.  Given how hard the WAL code is to read and fully grok, 
I'd rather avoid any unnecessary burden, even small ones.

>> The new function CheckWALPermitted() seems to test the current state of 
>> variables but not lock any of them, and the new function comment says:
>> 
> 
> CheckWALPermitted() calls XLogInsertAllowed() does check the
> LocalXLogInsertAllowed flag which is local to that process only, and
> nobody else reads that concurrently.
> 
>> /*
>> * In opposite to the above assertion if a transaction doesn't have valid XID
>> * (e.g. VACUUM) then it won't be killed while changing the system state to 
>> WAL
>> * prohibited.  Therefore, we need to explicitly error out before entering 
>> into
>> * the critical section.
>> */
>> 
>> This suggests to me that a vacuum process can check whether wal is 
>> prohibited, then begin a critical section which needs wal to be allowed, and 
>> concurrently somebody else might disable wal without killing the vacuum 
>> process.  I'm given to wonder what horrors await when the vacuum process 
>> does something that needs to be wal logged but cannot be.  Does it trigger a 
>> panic?  I don't like the idea that calling pg_prohibit_wal durning a vacuum 
>> might panic the cluster.  If there is some reason this is not a problem, I 
>> think the comment should explain it.  In particular, why is it sufficient to 
>> check whether wal is prohibited before entering the critical section and not 
>> necessary to be sure it remains allowed through the lifetime of that 
>> critical section?
>> 
> 
> Hm, interrupts absorption are disabled inside the critical section.
> The wal prohibited state for that process (here vacuum) will never get
> set until it sees the interrupts & the system will not be said wal
> prohibited until every process sees that interrupts. I am not sure we
> should explain the characteristics of the critical section at this
> place, if want, we can add a brief saying that inside the critical
> section we should not worry about the state change which never happens
> because interrupts are disabled there.

I think the fact that interrupts are disabled during critical sections is 
understood, so there is no need to mention that.  The problem is that the 
method for taking the system read-only is less generally known, and readers of 
other sections of code need to jump to the definition of CheckWALPermitted to 
read the comments and understand what it does.  Take for example a code stanza 
from heapam.c:

    if (needwal)
        CheckWALPermitted();

    /* NO EREPORT(ERROR) from here till changes are logged */
    START_CRIT_SECTION();

Now, I know that interrupts won't be processed after starting the critical 
section, but I can see plain as day that an interrupt might get processed 
*during* CheckWALPermitted, since that function isn't atomic.  It might happen 
after the check is meaningfully finished but before the function actually 
returns.  So I'm not inclined to believe that the way this all works is 
dependent on interrupts being blocked.  So I think, maybe this is all protected 
by some other scheme.  But what?  It's not clear from the code comments for 
CheckWALPermitted, so I'm left having to reverse engineer the system to 
understand it.

One interpretation is that the signal handler will exit() my backend if it 
receives a signal saying that the system is going read-only, so there is no 
race condition.  But then why the call to CheckWALPermitted()?  If this 
interpretation were correct, we'd happily enter the critical section without 
checking, secure in the knowledge that as long as we haven't exited yet, all is 
ok.

Another interpretation is that the whole thing is just a performance trick.  
Maybe we're ok with the idea that we will occasionally miss the fact that wal 
is prohibited, do whatever work we need in the critical section, and then fail 
later.  But if that is true, it had better not be a panic, because designing 
the system to panic 1% of the time (or whatever percent it works out to be) 
isn't project style.  So looking into the critical section in the heapam.c 
code, I see:

        XLogBeginInsert();
        XLogRegisterData((char *) &xlrec, SizeOfHeapInplace);
    
        XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
        XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);

And jumping to the definition of XLogBeginInsert() I see

    /*  
     * WAL permission must have checked before entering the critical section.
     * Otherwise, WAL prohibited error will force system panic.
     */

So now I'm flummoxed.  Is it that the code is broken, or is it that I don't 
know what the strategy behind all this is?  If there were a code comment saying 
how this all works, I'd be in a better position to either know that it is truly 
safe or alternately know that the strategy is wrong.

Even if my analysis that this is all flawed is incorrect, I still think that a 
code comment would help.

>> v33-0007:
>> 
>> I don't really like what the documentation has to say about pg_prohibit_wal. 
>>  Why should pg_prohibit_wal differ from other signal sending functions in 
>> whether it returns a boolean?  If you believe it must always succeed, you 
>> can still define it as returning a boolean and always return true.  That 
>> leaves the door open to future code changes which might need to return false 
>> for some reason.
>> 
> 
> Ok, I am fine to always return true.

Ok.

>> But I also don't like the idea that existing transactions with xids are 
>> immediately killed.  Shouldn't this function take an optional timeout, 
>> perhaps defaulting to none, but otherwise allowing the user to put the 
>> system into WALPROHIBIT_STATE_GOING_READ_ONLY for a period of time before 
>> killing remaining transactions?
>> 
> 
> Ok, will check.
> 
>> Why is this function defined to take a boolean such that 
>> pg_prohibit_wal(true) means to prohibit wal and pg_prohibit_wal(false) means 
>> to allow wal.  Wouldn't a different function named pg_allow_wal() make it 
>> more clear?  This also would be a better interface if taking the system 
>> read-only had a timeout as I suggested above, as such a timeout parameter 
>> when allowing wal is less clearly useful.
>> 
> 
> Like Robert, I am too inclined to have a single function that is easy
> to remember.

For C language functions that take a bool argument, I can jump to the 
definition using ctags, and I assume most other developers can do so using 
whatever IDE they like.  For SQL functions, it's a bit harder to jump to the 
definition, particularly if you are logged into a production server where 
non-essential software is intentionally missing.  Then you have to wonder, what 
exactly is the boolean argument toggling here?

I don't feel strongly about this, though, and you don't need to change it.

> Apart from this, recently while testing this patch with
> pgbench where I have exhausted the connection limit and want to change
> the system's prohibited state in between but I was unable to do that,
> I wish I could do that using the pg_clt option. How about having a
> pg_clt option to alter wal prohibited state?

I'd have to review the implementation, but sure, that sounds like a useful 
ability.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to