In reading what you are describing, don't you think PG 9 goes a long way to helping you out?
On Sat, Nov 13, 2010 at 12:53 AM, Craig Ringer <cr...@postnewspapers.com.au>wrote: > On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote: > > The >> result is to have huge fragmentation on table space, unnecessary updates >> in all affected indexes, unnecessary costly I/O operations, poor >> performance on SELECT that retrieves big record sets (i.e. reports etc) >> and slower updates. >> > > Yep. It's all about trade-offs. For some workloads the in-table MVCC > storage setup works pretty darn poorly, but for most it seems to work quite > well. > > There are various other methods of implementing relational storage with > ACID properties. You can exclude all other transactions while making a > change to a table, ensuring that nobody else can see "old" or "new" rows so > there's no need to keep them around. You can use an out-of-line redo log > (a-la Oracle). Many other methods exist, too. > > They all have advantages and disadvantages for different workloads. It's > far from trivial to mix multiple schemes within a single database, so mixing > and matching schemes for different parts of your DB isn't generally > practical. > > > 1) When a raw UPDATE is performed, store all "new raw versions" either >> in separate temporary table space >> or in a reserved space at the end of each table (can be allocated >> dynamically) etc >> > > OK, so you want a redo log a-la Oracle? > > > 2) Any SELECT queries within the same session will be again accessing >> the new version of the row >> 3) Any SELECT queries from other users will still be accessing the old >> version >> > > ... and incurring horrible random I/O penalties if the redo log doesn't fit > in RAM. Again, a-la Oracle. > > Even read-only transactions have to hit the undo log if there's an update > in progress, because rows they need may have been moved out to the undo log > as they're updated in the main table storage. > > [snip description] > > > I understand that my suggestion seems to be too simplified and also that >> there are many implementation details and difficulties that I am not >> aware. >> > > It sounds like you're describing Oracle-style MVCC, using redo logs. > > > http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/ > > http://en.wikipedia.org/wiki/Multiversion_concurrency_control > > Oracle's MVCC approach has its own costs. Like Pg's, those costs increase > with update/delete frequency. Instead of table bloat, Oracle suffers from > redo log growth (or redo log size management issues). Instead of increased > table scan costs from dead rows, Oracle suffers from random I/O costs as it > looks up the out-of-line redo log for old rows. Instead of long-running > writer transactions causing table bloat, Oracle can have problems with > long-running reader transactions aborting when the redo log runs out of > space. > > Personally, I don't know enough to know which is "better". I suspect > they're just different, with different trade-offs. If redo logs allow you > to do without write-ahead logging, that'd be interesting - but then, the > WAL is useful for all sorts of replication options, and the use of linear > WALs means that write ordering in the tables doesn't need to be as strict, > which has performance advantages. > > -- > Craig Ringer > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >