Re: pgsql: Improve handling of parameter differences in physical replicatio

2020-04-14 Thread Peter Eisentraut

On 2020-03-30 20:10, Andres Freund wrote:

Also, shouldn't dynahash be adjusted as well? There's e.g. the
following HASH_ENTER path:
/* report a generic message */
if (hashp->isshared)
ereport(ERROR,

(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of shared 
memory")));


Could you explain further what you mean by this?  I don't understand how 
this is related.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgsql: Improve handling of parameter differences in physical replicatio

2020-04-04 Thread Peter Eisentraut
On 2020-04-03 19:55, Robert Haas wrote:> I think this patch needs to be 
reverted. The only places where it

changes anything are places where we were about to throw some error
anyway. But as Andres's analysis shows, that's not nearly good enough.


OK, reverted.




Re: pgsql: Improve handling of parameter differences in physical replicatio

2020-04-03 Thread Robert Haas
On Mon, Mar 30, 2020 at 2:10 PM Andres Freund  wrote:
> One important issue seems to me to be the size of the array that
> TransactionIdIsInProgress() allocates:
> /*
>  * If first time through, get workspace to remember main XIDs in. We
>  * malloc it permanently to avoid repeated palloc/pfree overhead.
>  */
> if (xids == NULL)
> {
> /*
>  * In hot standby mode, reserve enough space to hold all xids 
> in the
>  * known-assigned list. If we later finish recovery, we no 
> longer need
>  * the bigger array, but we don't bother to shrink it.
>  */
> int maxxids = RecoveryInProgress() ? 
> TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
>
> xids = (TransactionId *) malloc(maxxids * 
> sizeof(TransactionId));
> if (xids == NULL)
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
>  errmsg("out of memory")));
> }
>
> Which I think means we'll just overrun the xids array in some cases,
> e.g. if KnownAssignedXids overflowed.  Obviously we should have a
> crosscheck in the code (which we don't), but it was previously a
> supposedly unreachable path.

I think this patch needs to be reverted. The only places where it
changes anything are places where we were about to throw some error
anyway. But as Andres's analysis shows, that's not nearly good enough.
I am kind of surprised that Peter thought that would be good enough.
It is necessary, for something like this, to investigate all the
places where the code may be relying on a certain assumption, not just
assume that there's an error check everywhere that we rely on that
assumption and change only those places.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: Improve handling of parameter differences in physical replicatio

2020-03-30 Thread Fujii Masao




On 2020/03/31 3:10, Andres Freund wrote:

Hi,

On 2020-03-30 19:41:43 +0900, Fujii Masao wrote:

On 2020/03/30 16:58, Peter Eisentraut wrote:

Improve handling of parameter differences in physical replication

When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources.  If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.

So this patch relaxes this a bit.  Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem.  Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information.


I find it somewhat hostile that we don't display the actual resource
error once the problem is hit - we just pause. Sure, there's going to be
some previous log entry explaining what the actual parameter difference
is - but that could have been weeks ago. So either hard to find, or even
rotated out.



At that point we pause recovery, so a hot standby remains usable.
We also repeat the last warning message once a minute so it is
harder to miss or ignore.



I can't really imagine that the adjustments made in this patch are
sufficient.

One important issue seems to me to be the size of the array that
TransactionIdIsInProgress() allocates:
/*
 * If first time through, get workspace to remember main XIDs in. We
 * malloc it permanently to avoid repeated palloc/pfree overhead.
 */
if (xids == NULL)
{
/*
 * In hot standby mode, reserve enough space to hold all xids 
in the
 * known-assigned list. If we later finish recovery, we no 
longer need
 * the bigger array, but we don't bother to shrink it.
 */
int maxxids = RecoveryInProgress() ? 
TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;

xids = (TransactionId *) malloc(maxxids * 
sizeof(TransactionId));
if (xids == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));
}

Which I think means we'll just overrun the xids array in some cases,
e.g. if KnownAssignedXids overflowed.  Obviously we should have a
crosscheck in the code (which we don't), but it was previously a
supposedly unreachable path.

Similarly, the allocation in GetSnapshotData() will be too small, I
think:
if (snapshot->xip == NULL)
{
/*
 * First call for this snapshot. Snapshot is same size whether 
or not
 * we are in recovery, see later comments.
 */
snapshot->xip = (TransactionId *)
malloc(GetMaxSnapshotXidCount() * 
sizeof(TransactionId));
if (snapshot->xip == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));
Assert(snapshot->subxip == NULL);
snapshot->subxip = (TransactionId *)
malloc(GetMaxSnapshotSubxidCount() * 
sizeof(TransactionId));
if (snapshot->subxip == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));
}

I think basically all code using TOTAL_MAX_CACHED_SUBXIDS,
GetMaxSnapshotSubxidCount(), PROCARRAY_MAXPROCS needs to be reviewed
much more carefully than done here.


Also, shouldn't dynahash be adjusted as well? There's e.g. the
following HASH_ENTER path:
/* report a generic message */
if (hashp->isshared)
ereport(ERROR,

Re: pgsql: Improve handling of parameter differences in physical replicatio

2020-03-30 Thread Andres Freund
Hi,

On 2020-03-30 19:41:43 +0900, Fujii Masao wrote:
> On 2020/03/30 16:58, Peter Eisentraut wrote:
> > Improve handling of parameter differences in physical replication
> > 
> > When certain parameters are changed on a physical replication primary,
> > this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
> > record.  The standby then checks whether its own settings are at least
> > as big as the ones on the primary.  If not, the standby shuts down
> > with a fatal error.
> > 
> > The correspondence of settings between primary and standby is required
> > because those settings influence certain shared memory sizings that
> > are required for processing WAL records that the primary might send.
> > For example, if the primary sends a prepared transaction, the standby
> > must have had max_prepared_transaction set appropriately or it won't
> > be able to process those WAL records.
> > 
> > However, fatally shutting down the standby immediately upon receipt of
> > the parameter change record might be a bit of an overreaction.  The
> > resources related to those settings are not required immediately at
> > that point, and might never be required if the activity on the primary
> > does not exhaust all those resources.  If we just let the standby roll
> > on with recovery, it will eventually produce an appropriate error when
> > those resources are used.
> > 
> > So this patch relaxes this a bit.  Upon receipt of
> > XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
> > warning and set a global flag if there is a problem.  Then when we
> > actually hit the resource issue and the flag was set, we issue another
> > warning message with relevant information.

I find it somewhat hostile that we don't display the actual resource
error once the problem is hit - we just pause. Sure, there's going to be
some previous log entry explaining what the actual parameter difference
is - but that could have been weeks ago. So either hard to find, or even
rotated out.


> > At that point we pause recovery, so a hot standby remains usable.
> > We also repeat the last warning message once a minute so it is
> > harder to miss or ignore.


I can't really imagine that the adjustments made in this patch are
sufficient.

One important issue seems to me to be the size of the array that
TransactionIdIsInProgress() allocates:
/*
 * If first time through, get workspace to remember main XIDs in. We
 * malloc it permanently to avoid repeated palloc/pfree overhead.
 */
if (xids == NULL)
{
/*
 * In hot standby mode, reserve enough space to hold all xids 
in the
 * known-assigned list. If we later finish recovery, we no 
longer need
 * the bigger array, but we don't bother to shrink it.
 */
int maxxids = RecoveryInProgress() ? 
TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;

xids = (TransactionId *) malloc(maxxids * 
sizeof(TransactionId));
if (xids == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));
}

Which I think means we'll just overrun the xids array in some cases,
e.g. if KnownAssignedXids overflowed.  Obviously we should have a
crosscheck in the code (which we don't), but it was previously a
supposedly unreachable path.

Similarly, the allocation in GetSnapshotData() will be too small, I
think:
if (snapshot->xip == NULL)
{
/*
 * First call for this snapshot. Snapshot is same size whether 
or not
 * we are in recovery, see later comments.
 */
snapshot->xip = (TransactionId *)
malloc(GetMaxSnapshotXidCount() * 
sizeof(TransactionId));
if (snapshot->xip == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));
Assert(snapshot->subxip == NULL);
snapshot->subxip = (TransactionId *)
malloc(GetMaxSnapshotSubxidCount() * 
sizeof(TransactionId));
if (snapshot->subxip == NULL)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));
}

I think basically all code using TOTAL_MAX_CACHED_SUBXIDS,
GetMaxSnapshotSubxidCount(), PROCARRAY_MAXPROCS needs to be reviewed
much more carefully than done here.


Also, shouldn't dynahash be adjusted as well? There's e.g. the
following HASH_ENTER path:
/* report a generic message */
if (hashp->isshared)