Dear friends,

I'm planning to develop an extension, and I'm here for getting some help. But I would like to share the problem I intend to solve. Maybe my desired solution is not a good option.

    What I have:

* a lot of data being generated every day, which are mainly queried by an immutable column of type date or timestamp; * as a standard, almost every table has a bigserial id column as a primary key; * data is huge enough to demand table partitioning, which is implemented as suggested in Postgres documentation, by using triggers and table inheritance. A function called by cron deal with creation of new partitions.

What I would like to develop first is a custom type (let's call it datedserial) for replacing bigserial as the primary key:

* the type would be 8 bytes long, being 4 dedicated to storing the Date, and 4 dedicated to storing a serial within that day; * the text representation of the type would show its date and its serial number (something like '2015-10-02.0000007296' as a canonical form, but which could accept inputs like '20151002.0000007296'); * as a consequence of this internal representation, the serial part could not be greater than 4 billion and some; * support for operator classes allowing the type being used in GIN and GIST indexes would be optional for now.

That would allow me to have a compact primary key which I could use to partition the table based on the object's date. That would also allow me to partition detail tables on the foreign key column having this data type. Besides that, just by examining the value, mainly when used as a foreign key, I could infer where the record belongs to.

When I have a working custom data type, I would go to the next and harder part. I would like to create a new structure like a sequence, and it should behave exactly like sequences, but separated by a date space. So I would have functions similar to the following:

* createsequencegroup(sequence_group_name text): create a new named structure for managing the sequence group; * nextval(sequence_group_name text, context_date date): return next value of the sequence (as a datedserial) belonging to the sequence group and associated with the context date. The value returned have the context_date in its date part and the next value for that date in the sequence part. The first call for a specific date would return 1 for the sequence part. Concerning to concurrency and transactions, the function behaves exactly like nextval(sequence_group_name text); * currval(sequence_group_name text, context_date date): the currval function counterpart; * setval(sequence_group_name text, context_date date, int4 value): the setval function counterpart; * freeze_before(sequence_group_name text, freeze_date date): disallow using the sequence group with context dates before the freeze_date.

I would consider extending the data type to allow including information about the cluster which generated the values. This way, the user could set a configuration entry defining a byte value for identifying the cluster among others involved in replication, so that the sequence group could have different sequences not only for different dates, but for different nodes as well.

As I've said, I would like to package the resulting work as an extension.

For now, I would like some help about where to start. I've downloaded the postgres source code and have successfully compiled it using my Ubuntu desktop, although have not tested the resulting binary. Should I create a folder in the contrib directory and use another extension as a starting point? Is this the recommended path? Or is this too much and I should create a separate project?

    Thanks in advance.

    Best regards,
    Luciano Barcellos



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to