> Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap 
> page) is 8796093022208 (~9e13) bytes
> ... which results in 8192 1GB segments :O

8192 1GB segments is just 8TB, its not _that_ large.  At TripAdvisor we’ve been 
using a NoSQL solution to do session storage.  We are looking to probably swap 
that out to be Postgres (every other db backing the site is Postgres).  
Essentially, what I’m building is a system with 1 logical table that maps 
session id to a 2KB+ grab bag of ever changing session attributes which is 
partially normalized, partially json.  315 million uniques a month multiplied 
by the retention policy means I need to hold 2-4 billion session objects (and 
somehow expire old ones).  Additionally, most http calls can update the 
session, so between maintenance windows I expect to take around 20 billion 
'upserts’.  Obviously, I will have to shard and partition the table in 
practice, but this weekend I ran a test that demonstrated that a single table 
on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. 
 Well, it can for 38 hours… until you wrap xid’s on the toast table.  :P  I’ll 
be the first to admit that isn’t the normal use case though.  I’m happy to have 
found this thread, however, because I’m going to have to build around the 
global oid counter, explicitly the prevent the problem I explain below 
regarding clustering.

> Anybody actually reaching this limit out there?

Well its not the 4 billion row limit that concerns me, its the global shared 
counter in conjunction with pg_restore/clustering that is actually pretty 
concerning.

Just checked through all of TripAdvisor’s normal databases and the max tuples I 
see in single toast table is 17,000,000, so that is still a couple of orders of 
magnitude too small.  (however, close enough that it’ll be a concern in a few 
years).

However, I do have active databases where the current oid is between 1 billion 
and 2 billion.  They were last dump-restored for a hardware upgrade a couple 
years ago and were a bit more than half the size.  I therefore can imagine that 
I have tables which are keyed by ~8,000,000 consecutive oids.

I would argue that when it wraps there will be a single insert that will 
probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index 
scans inside of GetNewOidWithIndex.  Even partitioning doesn’t protect you from 
this potential problem.

What even more weird is that this issue can be trigged by consuming too many 
oid’s in a different database in the same cluster (i.e. creating large amounts 
of temp tables)

> The problem with changing the id from 32 to 64 bits is that the storage *for 
> everybody else* doubles, making the implementation slower for most.... though 
> this might be actually not that important.

Well, you aren’t doubling the storage.  Even if you have to store the key in 4 
places, you are adding 16 bytes per TOAST tuple.  If we work off the 2KB 
estimate for each TOAST tuple, then you are only increasing the storage by 
0.7%.  I’m sure there are more hidden costs but we are really only talking 
about a low single digit percent increase.  In exchange, you get to drop one 
index scan per toast insert; an index scan looking in the only hot part of the 
index. 

That being said I’d be perfectly happy merely giving each TOAST table its own 
sequence as that almost entire mitigates the risk of an unexpected lock up on 
reasonably sized tables/partitions, and provides a functional work around for 
those of us with larger than average installs.

- Matt K
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to