I'm spinning up a new Postgresql 14 database where I'll have to store a couple years worth of time series data at the rate of single-digit millions of rows per day. Since this has to run in AWS Aurora, I can't use TimescaleDB.
I've been soliciting advice for best practices for building this. One person I talked to said "try not to have more than 100 partitions", even with the latest postgresql you'll end up with a lot of lock contention if you go over 100 partitions. This person also recommended manually kicking off vacuums on a regular schedule rather than trusting autovacuum to work reliably on the partitioned tables. I've got several keys, besides the obvious time-key that I could partition on. I could do a multi-key partitioning scheme. Since the data is inbound at a relatively steady rate, if I partition on time, I can adjust the partitions to be reasonably similarly sized. What is a good partition size? Are there any tunables I should experiment with in particular for a database with only 2 or 3 tables in it but many partitions each with millions of rows? Since the data most frequently queried would be recent data (say the past month or so) would it make sense to build an archiving strategy that rolled up older partitions into larger ones? ie, do daily partitions for the first four weeks, then come up with a process that rolled them up into monthly partitions for the next few months, then maybe quarterly partitions for the data older than a year? (I'm thinking about ways to keep the partition count low - if that advice is justified.) Or, should I just have a single 7 Trillion row table with a BRIN index on the timestamp and not mess with partitions at all?