Re: making relfilenodes 56 bits

Dilip Kumar Wed, 21 Sep 2022 03:09:49 -0700

On Tue, Sep 20, 2022 at 10:44 PM Robert Haas <[email protected]> wrote:


Thanks for the review, please see my response inline for some of the
comments, rest all are accepted.

> On Fri, Sep 9, 2022 at 6:02 AM Dilip Kumar <[email protected]> wrote:
> > [ new patch ]
>
> +typedef pg_int64 RelFileNumber;
>
> This seems really random to me. First, why isn't this an unsigned
> type? OID is unsigned and I don't see a reason to change to a signed
> type. But even if we were going to change to a signed type, why
> pg_int64? That is declared like this:
>
> /* Define a signed 64-bit integer type for use in client API declarations. */
> typedef PG_INT64_TYPE pg_int64;
>
> Surely this is not a client API declaration....
>
> Note that if we change this a lot of references to INT64_FORMAT will
> need to become UINT64_FORMAT.
>
> I think we should use int64 at the SQL level, because we don't have an
> unsigned 64-bit SQL type, and a signed 64-bit type can hold 56 bits.
> So it would still be Int64GetDatum((int64) rd_rel->relfilenode) or
> similar. But internally I think using unsigned is cleaner.

Yeah you are right we can make it uint64.  With respect to this, we
can not directly use uint64 because that is declared in c.h and that
can not be used in
postgres_ext.h IIUC.  So what are the other option maybe we can
typedef the RelFIleNumber similar to what c.h done for uint64 i.e.

#ifdef HAVE_LONG_INT_64
typedef unsigned long int uint64;
#elif defined(HAVE_LONG_LONG_INT_64)
typedef long long int int64;
#endif

And maybe same for UINT64CONST ?

I am not liking duplicating this logic but is there any better
alternative for doing this?  Can we move the existing definitions from
c.h file to some common file (common for client and server)?

>
> +    if (relnumber >= ShmemVariableCache->loggedRelFileNumber)
>
> It probably doesn't make any difference, but to me it seems better to
> test flushedRelFileNumber rather than logRelFileNumber here. What do
> you think?

Actually based on this condition are logging more so it make more
sense to check w.r.t loggedRelFileNumber, but OTOH technically,
without flushing log we are not supposed to use the relfilenumber so
make more sense to test flushedRelFileNumber.  But since both are the
same I am fine with flushedRelFileNumber.

>      /*
>       * We set up the lockRelId in case anything tries to lock the dummy
> -     * relation.  Note that this is fairly bogus since relNumber may be
> -     * different from the relation's OID.  It shouldn't really matter though.
> -     * In recovery, we are running by ourselves and can't have any lock
> -     * conflicts.  While syncing, we already hold AccessExclusiveLock.
> +     * relation.  Note we are setting relId to just FirstNormalObjectId which
> +     * is completely bogus.  It shouldn't really matter though. In recovery,
> +     * we are running by ourselves and can't have any lock conflicts.  While
> +     * syncing, we already hold AccessExclusiveLock.
>       */
>      rel->rd_lockInfo.lockRelId.dbId = rlocator.dbOid;
> -    rel->rd_lockInfo.lockRelId.relId = rlocator.relNumber;
> +    rel->rd_lockInfo.lockRelId.relId = FirstNormalObjectId;
>
> Boy, this makes me uncomfortable. The existing logic is pretty bogus,
> and we're replacing it with some other bogus thing. Do we know whether
> anything actually does try to use this for locking?
>
> One notable difference between the existing logic and your change is
> that, with the existing logic, we use a bogus value that will differ
> from one relation to the next, whereas with this change, it will
> always be the same value. Perhaps el->rd_lockInfo.lockRelId.relId =
> (Oid) rlocator.relNumber would be a more natural adaptation?
>
> +#define CHECK_RELFILENUMBER_RANGE(relfilenumber)                \
> +do {                                                                \
> +    if ((relfilenumber) < 0 || (relfilenumber) > MAX_RELFILENUMBER) \
> +        ereport(ERROR,                                              \
> +                errcode(ERRCODE_INVALID_PARAMETER_VALUE),           \
> +                errmsg("relfilenumber %lld is out of range",    \
> +                        (long long) (relfilenumber))); \
> +} while (0)
>
> Here, you take the approach of casting the relfilenumber to long long
> and then using %lld. But elsewhere, you use
> INT64_FORMAT/UINT64_FORMAT. If we're going to use this technique, we
> ought to use it everywhere.

Based on the discussion [1], it seems we can not use
INT64_FORMAT/UINT64_FORMAT while using ereport.  But all other places
I am using INT64_FORMAT/UINT64_FORMAT.  Does this make sense?

[1] 
https://www.postgresql.org/message-id/20220730113922.qd7qmenwcmzyacje%40alvherre.pgsql

>  typedef struct
>  {
> -    Oid         reltablespace;
> -    RelFileNumber relfilenumber;
> -} RelfilenumberMapKey;
> -
> -typedef struct
> -{
> -    RelfilenumberMapKey key;    /* lookup key - must be first */
> +    RelFileNumber relfilenumber;    /* lookup key - must be first */
>      Oid         relid;          /* pg_class.oid */
>  } RelfilenumberMapEntry;
>
> This feels like a bold change. Are you sure it's safe? i.e. Are you
> certain that there's no way that a relfilenumber could repeat within a
> database?

IIUC, as of now, CREATE DATABASE is the only option which can create
the duplicate relfilenumber but that would be in different databases.
So based on that theory I think it should be safe.

If we're going to bank on that, we could adapt this more
> heavily, e.g. RelidByRelfilenumber() could lose the reltablespace
> parameter.

Yeah we might, although we need a bool to identify whether it is
shared relation or not.

I think maybe we should push this change into an 0002 patch
> (or later) and have 0001 just do a minimal adaptation for the changed
> data type.

Yeah that make sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: making relfilenodes 56 bits

Reply via email to