Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Either way, I think it would be interesting to consider >> >> (a) length word either one or two bytes, not four. You can't need more >> than 2 bytes for a datum that fits in a disk page ...
> That is an interesting observation, though could compressed inline > values exceed two bytes? After expansion, perhaps, but it's the on-disk footprint that concerns us here. I thought a bit more about this and came up with a zeroth-order sketch: The "length word" for an on-disk datum could be either 1 or 2 bytes; in the 2-byte case we'd need to be prepared to fetch the bytes separately to avoid alignment issues. The high bits of the first byte say what's up: * First two bits 00: 2-byte length word, uncompressed inline data follows. This allows a maximum on-disk size of 16K for an uncompressed datum, so we lose nothing at all for standard-size disk pages and not much for 32K pages (remember the toaster will try to compress any tuple exceeding 1/4 page anyway ... this just makes it mandatory). * First two bits 01: 2-byte length word, compressed inline data follows. Again, hard limit of 16K, so if your data exceeds that you have to push it out to the toast table. Again, this policy costs zero for standard size disk pages and not much for 32K pages. * First two bits 10: 1-byte length word, zero to 62 bytes of uncompressed inline data follows. This is the case that wins for short values. * First two bits 11: 1-byte length word, pointer to out-of-line toast data follows. We may as well let the low 6 bits of the length word be the size of the toast pointer, same as it works now. Since the toast pointer is not guaranteed aligned anymore, we'd have to memcpy it somewhere before using it ... but compared to the other costs of fetching a toast value, that's surely down in the noise. The distinction between compressed and uncompressed toast data would need to be indicated in the body of the toast pointer, not in the length word as today, but nobody outside of tuptoaster.c would care. Notice that heap_deform_tuple only sees 2 cases here: high bit 0 means 2-byte length word, high bit 1 means 1-byte. It doesn't care whether the data is compressed or toasted, same as today. There are other ways we could divvy up the bit assignments of course. The main issue is keeping track of whether any given Datum is in this compressed-for-disk format or in the uncompressed 4-byte-length-word format. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend