[HACKERS] Advices on custom data type and extension development

Luciano Coutinho Barcellos Mon, 18 Jan 2016 19:39:44 -0800

Dear friends,

I'm planning to develop an extension, and I'm here for getting somehelp. But I would like to share the problem I intend to solve. Maybe mydesired solution is not a good option.


    What I have:

* a lot of data being generated every day, which are mainlyqueried by an immutable column of type date or timestamp;* as a standard, almost every table has a bigserial id columnas a primary key;* data is huge enough to demand table partitioning, which isimplemented as suggested in Postgres documentation, by using triggersand table inheritance. A function called by cron deal with creation ofnew partitions.

What I would like to develop first is a custom type (let's call itdatedserial) for replacing bigserial as the primary key:

* the type would be 8 bytes long, being 4 dedicated to storingthe Date, and 4 dedicated to storing a serial within that day;* the text representation of the type would show its date andits serial number (something like '2015-10-02.0000007296' as a canonicalform, but which could accept inputs like '20151002.0000007296');* as a consequence of this internal representation, the serialpart could not be greater than 4 billion and some;* support for operator classes allowing the type being used inGIN and GIST indexes would be optional for now.

That would allow me to have a compact primary key which I could useto partition the table based on the object's date. That would also allowme to partition detail tables on the foreign key column having this datatype. Besides that, just by examining the value, mainly when used as aforeign key, I could infer where the record belongs to.

When I have a working custom data type, I would go to the next andharder part. I would like to create a new structure like a sequence, andit should behave exactly like sequences, but separated by a date space.So I would have functions similar to the following:

* createsequencegroup(sequence_group_name text): create a newnamed structure for managing the sequence group;* nextval(sequence_group_name text, context_date date): returnnext value of the sequence (as a datedserial) belonging to the sequencegroup and associated with the context date. The value returned have thecontext_date in its date part and the next value for that date in thesequence part. The first call for a specific date would return 1 for thesequence part. Concerning to concurrency and transactions, the functionbehaves exactly like nextval(sequence_group_name text);* currval(sequence_group_name text, context_date date): thecurrval function counterpart;* setval(sequence_group_name text, context_date date, int4value): the setval function counterpart;* freeze_before(sequence_group_name text, freeze_date date):disallow using the sequence group with context dates before the freeze_date.

I would consider extending the data type to allow includinginformation about the cluster which generated the values. This way, theuser could set a configuration entry defining a byte value foridentifying the cluster among others involved in replication, so thatthe sequence group could have different sequences not only for differentdates, but for different nodes as well.

As I've said, I would like to package the resulting work as anextension.

For now, I would like some help about where to start. I'vedownloaded the postgres source code and have successfully compiled itusing my Ubuntu desktop, although have not tested the resulting binary.Should I create a folder in the contrib directory and use anotherextension as a starting point? Is this the recommended path? Or is thistoo much and I should create a separate project?


    Thanks in advance.

    Best regards,
    Luciano Barcellos



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Advices on custom data type and extension development

Reply via email to