Re: [HACKERS] Storage Model for Partitioning

Richard Huxton Fri, 11 Jan 2008 05:28:52 -0800

Simon Riggs wrote:

On Fri, 2008-01-11 at 11:34 +0000, Richard Huxton wrote:
Is the following basically the same as option #3 (multiple RelFileNodes)?
1. Make an on-disk "chunk" much smaller (e.g. 64MB). Each chunk is acontigous range of blocks.2. Make a table-partition (implied or explicit constraints) map tomultiple "chunks".That would reduce fragmentation (you'd have on average 32MB's worth ofblocks wasted per partition) and allow for stretchy partitions at thecost of an extra layer of indirection.
For the single-partition case you'd not need to split the file ofcourse, so it would end up looking much like the current arrangement.
We need to think about the "data model" of the storage layer. Space
itself isn't the issue, its the assumptions that all of the other
subsystems currently make about what how a table is structured, indexed,
accessed and manipulated.

Which was why I was thinking you'd want to maintain indexes etc.thinking in terms of a table being a contiguous set of blocks, with themapping to an actual on-disk block taking place below that level. (IfI've understood you).

Currently: Table 1:M Segments

Option 1: Table 1:M Segments and *separately* Table 1:M Partitions, so
partitions are always have a maximum size. The size just changes the
impact, doesn't change the impact of holes, max sizes etc.
e.g. empty table with 10 partitions would be
a) 0 bytes in 1 file
b) 0 bytes in 1 file, plus 9GB in 9 files all full of empty blocks

Well, presumably 0GB in 10 files, but 10GB-worth of block-numbers"pre-allocated".

e.g. table with 10 partitions each of 1.5GB would be
a) 15 GB in 15 files

With the limitation that any given partition might contain a mix ofdata-ranges (e.g. 2005 lies half in partition 2 and half in partition 3).

b) hit max size limit of partition: ERROR

In the case of 1b, you could have a segment mapping to more than 1partition, avoiding the error. So 2004 data is in partition 1, 2005 isin partitions 2,3 (where 3 is half empty), 2006 is in partition 4.However, this does mean you've got a lot of wasted block numbers. If youwere using explicit (fixed) partitioning and chose a bad set of criteriayour maximum table size could be substantially reduced.

Option 2: Table 1:M Child Tables 1:M Segmentse.g. empty table with 10 partitions would be
0 bytes in each of 10 files

e.g. table with 10 partitions each of 1.5GB would be
15GB in 10 groups of 2 files

Cross-table indexes and constraints would be useful outside of thecurrent scenario.

Option 3: Table 1:M Nodes 1:M Segmentse.g. empty table with 10 partitions would be
0 bytes in each of 10 files

e.g. table with 10 partitions each of 1.5GB would be
15GB in 10 groups of 2 files

Ah, so this does seem to be roughly the same as I was rambling about.This would presumably mean that rather than (table, block #) specifyingthe location of a row you'd need (table, node #, block #).

So 1b) seems definitely out.

The implications of 2 and 3 are what I'm worried about, which is why the
shortcomings of 1a) seem acceptable currently.


--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] Storage Model for Partitioning

Reply via email to