Re: [HACKERS] Sequence Access Method WIP

Heikki Linnakangas Tue, 04 Nov 2014 04:12:26 -0800

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Hi,


I rewrote the patch with different API along the lines of what was
discussed.


Thanks, that's better.

It would be good to see an alternative seqam to implement this API, tosee how it really works. The "local" one is too dummy to expose anypossible issues.

The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of
requested values amdata and relevant sequence options like min/max and
returns new amdata, new current value, number of values allocated and
also if it needs wal write (that should be returned if amdata has
changed plus other reasons the AM might have to force the wal update).

sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus
the amdata and returns amdata (I can imagine some complex sequence AMs
will want to throw error that setval can't be done on them).

sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then
call the sequence_update method of an AM with current amdata and will
write the updated amdata to disk

sequence_seqparams - function to process/validate the standard sequence
options like start position, min/max, increment by etc by the AM, it's
called in addition to the standard processing

sequence_reloptions - this is the only thing that remained unchanged
from previous patch, it's meant to pass custom options to the AM

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.

Hmm. The division of labour between the seqam and commands/sequence.cstill feels a bit funny. sequence.c keeps track of how many values havebeen WAL-logged, and thus usable immediately, but we still callsequence_alloc even when using up those already WAL-logged values. Ifyou think of using this for something like a centralized sequence serverin a replication cluster, you certainly don't want to make a call to theremote server for every value - you'll want to cache them.

With the "local" seqam, there are two levels of caching. Each backendcaches some values (per the CACHE <value> option in CREATE SEQUENCE). Inaddition to that, the server WAL-logs 32 values at a time. If you have aremote seqam, it would most likely add a third cache, but it wouldinteract in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. Theserver keeps the sequence page locked throughout nextval(). But if theactual state of the sequence is maintained elsewhere, there's no need toserialize the calls to the remote allocator, i.e. the sequence_alloc()calls.

I'm not exactly sure what to do about that. One option is to completelymove the maintenance of the "current" value, i.e. sequence.last_value,to the seqam. That makes sense from an abstraction point of view. Forexample with a remote server managing the sequence, storing the"current" value in the local catalog table makes no sense as it's alwaysgoing to be out-of-date. The local seqam would store it as part of theam-private data. However, you would need to move the responsibility oflocking and WAL-logging to the seqam. Maybe that's OK, but we'll need toprovide an API that the seqam can call to do that. Perhaps just let theseqam call heap_inplace_update on the sequence relation.

For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.

It would be nice if the seqam could define exactly the columns it needs,with any datatypes. There would be a set of common attributes:sequence_name, start_value, cache_value, increment_by, max_value,min_value, is_cycled. The local seqam would add "last_value", "log_cnt"and "is_called" to that. A remote seqam that calls out to some otherserver might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with therequired columns, for example.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP

Reply via email to