I'm spinning up a new Postgresql 14 database where I'll have to store a
couple years worth of time series data at the rate of single-digit millions
of rows per day.  Since this has to run in AWS Aurora, I can't use
TimescaleDB.

I've been soliciting advice for best practices for building this.

One person I talked to said "try not to have more than 100 partitions",
even with the latest postgresql you'll end up with a lot of lock contention
if you go over 100 partitions.  This person also recommended manually
kicking off vacuums on a regular schedule rather than trusting autovacuum
to work reliably on the partitioned tables.

I've got several keys, besides the obvious time-key that I could partition
on.   I could do a multi-key partitioning scheme.  Since the data is
inbound at a relatively steady rate, if I partition on time, I can adjust
the partitions to be reasonably similarly sized.  What is a good partition
size?

Are there any tunables I should experiment with in particular for a
database with only 2 or 3 tables in it but many partitions each with
millions of rows?

Since the data most frequently queried would be recent data (say the past
month or so) would it make sense to build an archiving strategy that rolled
up older partitions into larger ones?  ie, do daily partitions for the
first four weeks, then come up with a process that rolled them up into
monthly partitions for the next few months, then maybe quarterly partitions
for the data older than a year?  (I'm thinking about ways to keep the
partition count low - if that advice is justified.)

Or, should I just have a single 7 Trillion row table with a BRIN index on
the timestamp and not mess with partitions at all?

Reply via email to