I have two partitioning questions I am hoping somebody can help me with.

I have a fairly busy metric(ish) table. It gets a few million records per
day, the data is transactional for a while but then settles down and is
used for analytical purposes later.

When a metric is reported both the UTC time and the local times are stored
along with the other data belonging to the metric.  I want to partition
this table to both make it faster to query and also to spread out the
writes.  Ideally the partitions would be based on the UTC timestamp and the
sending location. For example

metrics_location_XXXXX_2015_01_01

First problem with this approach is that there could be tens of thousands
of locations so this is going to result hundreds of thousands of tables.
I know there are no upper limits to how many tables there are but I am
thinking this might really get me into trouble later.

Second and possibly more vexing problem is that often the local time is
queried.  Ideally I would like to put three constraints on the child
tables. Location id, UTC timestamp and the local time but obviously the
local timestamps would overlap with other locations in the same timezone
 Even if I was to only partition by UTC the local timestamps would overlap
between tables.

So the questions in a nutshell are.

1. Should I be worried about having possibly hundreds of thousands of
shards.
2. Is PG smart enough to handle overlapping constraints on table and limit
it's querying to only those tables that have the correct time constraint.

Thanks.

Reply via email to