Simon Riggs wrote:
On Fri, 2008-01-11 at 11:34 +0000, Richard Huxton wrote:
Is the following basically the same as option #3 (multiple RelFileNodes)?
1. Make an on-disk "chunk" much smaller (e.g. 64MB). Each chunk is a
contigous range of blocks.
2. Make a table-partition (implied or explicit constraints) map to
multiple "chunks".
That would reduce fragmentation (you'd have on average 32MB's worth of
blocks wasted per partition) and allow for stretchy partitions at the
cost of an extra layer of indirection.
For the single-partition case you'd not need to split the file of
course, so it would end up looking much like the current arrangement.
We need to think about the "data model" of the storage layer. Space
itself isn't the issue, its the assumptions that all of the other
subsystems currently make about what how a table is structured, indexed,
accessed and manipulated.
Which was why I was thinking you'd want to maintain indexes etc.
thinking in terms of a table being a contiguous set of blocks, with the
mapping to an actual on-disk block taking place below that level. (If
I've understood you).
Currently: Table 1:M Segments
Option 1: Table 1:M Segments and *separately* Table 1:M Partitions, so
partitions are always have a maximum size. The size just changes the
impact, doesn't change the impact of holes, max sizes etc.
e.g. empty table with 10 partitions would be
a) 0 bytes in 1 file
b) 0 bytes in 1 file, plus 9GB in 9 files all full of empty blocks
Well, presumably 0GB in 10 files, but 10GB-worth of block-numbers
"pre-allocated".
e.g. table with 10 partitions each of 1.5GB would be
a) 15 GB in 15 files
With the limitation that any given partition might contain a mix of
data-ranges (e.g. 2005 lies half in partition 2 and half in partition 3).
b) hit max size limit of partition: ERROR
In the case of 1b, you could have a segment mapping to more than 1
partition, avoiding the error. So 2004 data is in partition 1, 2005 is
in partitions 2,3 (where 3 is half empty), 2006 is in partition 4.
However, this does mean you've got a lot of wasted block numbers. If you
were using explicit (fixed) partitioning and chose a bad set of criteria
your maximum table size could be substantially reduced.
Option 2: Table 1:M Child Tables 1:M Segments
e.g. empty table with 10 partitions would be
0 bytes in each of 10 files
e.g. table with 10 partitions each of 1.5GB would be
15GB in 10 groups of 2 files
Cross-table indexes and constraints would be useful outside of the
current scenario.
Option 3: Table 1:M Nodes 1:M Segments
e.g. empty table with 10 partitions would be
0 bytes in each of 10 files
e.g. table with 10 partitions each of 1.5GB would be
15GB in 10 groups of 2 files
Ah, so this does seem to be roughly the same as I was rambling about.
This would presumably mean that rather than (table, block #) specifying
the location of a row you'd need (table, node #, block #).
So 1b) seems definitely out.
The implications of 2 and 3 are what I'm worried about, which is why the
shortcomings of 1a) seem acceptable currently.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend