Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-19 Thread Hannu Krosing

Ühel kenal päeval, E, 2007-12-17 kell 09:20, kirjutas Simon Riggs:
 On Sat, 2007-12-15 at 13:32 +0100, Albert Cervera i Areny wrote:
   Read-Only Tables
   
   Postgres supports the concept of freezing tuples, so they can live
   forever within the database without needing further writes. Currently
   there is no command that will guarantee that a table has been completely
   frozen. This makes it difficult to reliably write data files to WORM
   media for longer term archiving. (WORM means Write-Once, Read-Many).
   It's also a pain having to VACUUM a large table again just because a
   small number of rows need to be frozen.
  
  
  I'm not an expert at all, but I'd like to understand this, do you plan that 
  READ-ONLY tables wouldn't even store transaction information? That should 
  save quite a lot of space. Maybe when the table would be moved to the 
  compressed tablespace, MVCC information could be dropped too? Of course 
  that 
  would avoid future insert  update possibilities though.
 
 It could, but its a lot of work for little gain. The tuple headers look
 like they will compress fairly well, so why bother to remove them at
 all?

One place for removing them would be if we do column-stores where there
would be one header per column instead of one per tuple.

--
Hannu




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-17 Thread Simon Riggs
On Sat, 2007-12-15 at 13:32 +0100, Albert Cervera i Areny wrote:
  Read-Only Tables
  
  Postgres supports the concept of freezing tuples, so they can live
  forever within the database without needing further writes. Currently
  there is no command that will guarantee that a table has been completely
  frozen. This makes it difficult to reliably write data files to WORM
  media for longer term archiving. (WORM means Write-Once, Read-Many).
  It's also a pain having to VACUUM a large table again just because a
  small number of rows need to be frozen.
 
 
 I'm not an expert at all, but I'd like to understand this, do you plan that 
 READ-ONLY tables wouldn't even store transaction information? That should 
 save quite a lot of space. Maybe when the table would be moved to the 
 compressed tablespace, MVCC information could be dropped too? Of course that 
 would avoid future insert  update possibilities though.

It could, but its a lot of work for little gain. The tuple headers look
like they will compress fairly well, so why bother to remove them at
all?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-15 Thread Albert Cervera i Areny
 Read-Only Tables
 
 Postgres supports the concept of freezing tuples, so they can live
 forever within the database without needing further writes. Currently
 there is no command that will guarantee that a table has been completely
 frozen. This makes it difficult to reliably write data files to WORM
 media for longer term archiving. (WORM means Write-Once, Read-Many).
 It's also a pain having to VACUUM a large table again just because a
 small number of rows need to be frozen.


I'm not an expert at all, but I'd like to understand this, do you plan that 
READ-ONLY tables wouldn't even store transaction information? That should 
save quite a lot of space. Maybe when the table would be moved to the 
compressed tablespace, MVCC information could be dropped too? Of course that 
would avoid future insert  update possibilities though.

-- 
Albert Cervera i Areny
http://www.NaN-tic.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Josh Berkus
Simon,

 Use Case: VLDB with tons of (now) read only data, some not. Data needs
 to be accessible, but data itself is rarely touched, allowing storage
 costs to be minimised via a storage hierarchy of progressively cheaper
 storage.

There's actually 2 cases to optimize for:
1) write-once-read-many (WORM)
2) write-once-read-seldom (WORS)

The 2nd case is becoming extremely popular due to the presence of 
government-mandated records databases.  For example, I'm currently working on 
one call completion records database which will hold 75TB of data, of which 
we expect less than 1% to *ever* be queried.

One of the other things I'd like to note is that for WORM, conventional 
storage is never going to approach column-store DBs for general performance.  
So, should we be working on incremental improvements like the ones you 
propose, or should we be working on integrating a c-store into PostgreSQL on 
a per-table basis?


-- 
Josh the Fuzzy Berkus
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Simon Riggs
On Tue, 2007-12-11 at 20:30 -0800, Josh Berkus wrote:
 Simon,
 
  Use Case: VLDB with tons of (now) read only data, some not. Data needs
  to be accessible, but data itself is rarely touched, allowing storage
  costs to be minimised via a storage hierarchy of progressively cheaper
  storage.
 
 There's actually 2 cases to optimize for:
 1) write-once-read-many (WORM)
 2) write-once-read-seldom (WORS)
 
 The 2nd case is becoming extremely popular due to the presence of 
 government-mandated records databases.  For example, I'm currently working on 
 one call completion records database which will hold 75TB of data, of which 
 we expect less than 1% to *ever* be queried.

Well, that's exactly the use case I'm writing for. I called that an
archival data store in my post on VLDB Features.

WORM is a type of storage that might be used, so it would be somewhat
confusing if we use it as the name of a specific use case. 

Getting partitioning/read-only right will allow 70+TB of that to be on
tape or similar, which with compression can be reduced to maybe 20TB? I
don't want to promise any particular compression ratio, but it will make
a substantial difference, as I'm sure you realise.

 One of the other things I'd like to note is that for WORM, conventional 
 storage is never going to approach column-store DBs for general performance.  
 So, should we be working on incremental improvements like the ones you 
 propose, or should we be working on integrating a c-store into PostgreSQL on 
 a per-table basis?

What I'm saying is that there are some features that all VLDBs need. If
we had a column store DB we would still need partitioning as well or the
data structures would become unmanageable. Plus partitioning can allow
the planner to avoid de-archiving/spinning up data and help reduce
storage costs.

Radical can be good, but it can take more time also. I dare say it would
be harder for the community to accept also. So I look for worthwhile
change in acceptable size chunks.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Zeugswetter Andreas ADI SD

 There are a number of nasty
 limitations for partitions currently (not the least of which is that
real
 uniqueness guarantees are impractical),

Just to add an other opinion to this statement, because it imho sounds
overly 
pessimistic:

Uniqueness is currently perfectly practical, when the unique index
contains
the column[s] that is/are used in a non overlapping partitioning scheme.

If you cannot create separate unique indexes on each partition that
guarantee
global uniqueness because of the chosen partitioning scheme, you can
often 
reconsider your scheme (e.g. use natural keys instead of serials).

Other db software allows creating global indexes, or indexes with
separate 
partitioning schemes, but this is then often a pain. When you
drop/attach/detach
a partition such an index needs to be recreated or reorganized. This
then makes 
a large slow transaction out of attach/detach partition. 
If you don't need to attach/detach, there is still one other argument
against 
the huge global index which is fault isolation.

There is imho large room to make it better than others :-)
And I think we should not regard them as positive examples,
because that narrows the view.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Zeugswetter Andreas ADI SD
 Getting partitioning/read-only right will allow 70+TB of that to be on
 tape or similar, which with compression can be reduced to maybe 20TB?
I
 don't want to promise any particular compression ratio, but it will
make
 a substantial difference, as I'm sure you realise.

Wouldn't one very substantial requirement of such storage be to
have it independent of db version, or even db product? Keeping
old hardware and software around can be quite expensive.

So, wouldn't a virtual table interface be a better match for such a   
problem ? Such a virtual table should be allowed to be part of a
partitioning 
scheme, have native or virtual indexes, ...

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Andrew Sullivan
On Wed, Dec 12, 2007 at 12:14:43PM +0100, Zeugswetter Andreas ADI SD wrote:
 Uniqueness is currently perfectly practical, when the unique index
 contains
 the column[s] that is/are used in a non overlapping partitioning scheme.

Well, yes, assuming you have no bugs.  Part of the reason I want the
database to handle this for me is because, where I've come from, the only
thing I can be sure of is that there will be bugs.  There'll even be bugs
before there is running code.  One bug I can easily imagine is that the
non-overlapping partitioning scheme has a bug in it, such that it turns out
there _is_ an overlap some time.

All of that said, I agree with you, particularly about the alternative ways
things can suck instead :-/

A


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Andrew Sullivan
On Wed, Dec 12, 2007 at 12:58:11PM +0100, Zeugswetter Andreas ADI SD wrote:
 Wouldn't one very substantial requirement of such storage be to
 have it independent of db version, or even db product? Keeping
 old hardware and software around can be quite expensive.

This was one of the explicit requirements I had when I wrote my pie in the
sky outline.  Hrm.  I wonder if I can get permission to post it.  Let me
find out.

The requirement was, anyway, that we be able to read old versions of
archived rows.  IIRC there was an implementation choice, whether we would
_never_ allow such rows to be SET READ WRITE or whether they'd be
immediately upgraded to the present format on SET READ WRITE.

A


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Simon Riggs
On Wed, 2007-12-12 at 11:22 -0500, Andrew Sullivan wrote:
 On Wed, Dec 12, 2007 at 12:14:43PM +0100, Zeugswetter Andreas ADI SD wrote:
  Uniqueness is currently perfectly practical, when the unique index
  contains
  the column[s] that is/are used in a non overlapping partitioning scheme.
 
 Well, yes, assuming you have no bugs.  Part of the reason I want the
 database to handle this for me is because, where I've come from, the only
 thing I can be sure of is that there will be bugs.  There'll even be bugs
 before there is running code.  One bug I can easily imagine is that the
 non-overlapping partitioning scheme has a bug in it, such that it turns out
 there _is_ an overlap some time.

Enforcing uniqueness with a global index has a number of disadvantages.

The worst of these is that the index continues to get bigger and bigger
as the total data volume increases. You have to index all partitions,
plus each index entry needs to include a partition id as well as the
index key. So not only is it big, its huge. Huge indexes are slow, so an
index with terabytes of data in it is going to be almost unusable.

The best thing to do would be to sit down and work out exactly how big
and deep such an index would be in the case you're thinking of so we can
tell whether it is very bad or merely bad.

I seem to be the only one saying global indexes are bad, so if people
that want them can do the math and honestly say they want them, then I
will listen.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Simon Riggs
On Wed, 2007-12-12 at 10:48 -0800, Josh Berkus wrote:
 Andrew,
 
  The requirement was, anyway, that we be able to read old versions of
  archived rows.  IIRC there was an implementation choice, whether we would
  _never_ allow such rows to be SET READ WRITE or whether they'd be
  immediately upgraded to the present format on SET READ WRITE.
 
 Well, in theory we need this capability for upgrade-in-place too.  While that 
 project has kind of stalled for the moment, we'll pick it back up again soon.

Who was working on it?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Josh Berkus
Andrew,

 The requirement was, anyway, that we be able to read old versions of
 archived rows.  IIRC there was an implementation choice, whether we would
 _never_ allow such rows to be SET READ WRITE or whether they'd be
 immediately upgraded to the present format on SET READ WRITE.

Well, in theory we need this capability for upgrade-in-place too.  While that 
project has kind of stalled for the moment, we'll pick it back up again soon.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 12 Dec 2007 19:07:57 +
Simon Riggs [EMAIL PROTECTED] wrote:

 I seem to be the only one saying global indexes are bad, so if people
 that want them can do the math and honestly say they want them, then I
 will listen.

global indexes are bad for certain situations for others they are
required. Constraint Exclusion/Partitioning is not only for ginormous
tables. 

It can also be used for maintenance efficiency, micro optimizations and
just general data architecture.

Joshua D. Drake
 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHYDPhATb/zqfZUUQRApkWAJ0ZRixV0QD5DCAZxexq/oOojkIftwCfZqDv
LA1HPCP/h2di7Xlj2uju0zo=
=/lMO
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Josh Berkus
Simon,

 Who was working on it?

Zdenec and Dhanaraj from Sun, and someone from EDB (I'm not sure who, maybe 
Korry?).  Unfortunately, both companies have shifted staff around and we need 
to re-start work.

Of course, if hackers other than those from EDB  Sun want to attack the 
problem, the more the merrier.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-12 Thread Andrew Sullivan
On Wed, Dec 12, 2007 at 07:07:57PM +, Simon Riggs wrote:
 
 Enforcing uniqueness with a global index has a number of disadvantages.

This is why I was trying to talk about constraints rather than global
indexes.  Just because we happen to implement them that way today does not
mean that such constraints need be implemented that way in every case.

I think especially for the sort of detached rows scenario I was dreaming
about, a global index is never going to be good.

A


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 So... VACUUM FREEZE table SET READ ONLY;

 would be my first thought, but I'm guessing everybody will press me
 towards supporting the more obvious

 ALTER TABLE table SET READ ONLY;

 This command will place a ShareLock (only) on the table, preventing
 anybody from writing to the table while we freeze it. The ShareLock is
 incompatible with any transaction that has written to the table, so when
 we acquire the lock all writers to the table will have completed. We
 then run the equivalent of a VACUUM FREEZE which will then be able to
 freeze *all* rows in one pass (rather than all except the most recent).
 On completion of the freeze pass we will then update the pg_class entry
 to show that it is now read-only, so we will emulate the way VACUUM does
 this.

To be clear it if it meets a block for which a tuple is not freezable -- that
is, it has an xmin or xmax more recent than the global xmin then it needs to
block waiting for the backend which that recent xmin. Then presumably it needs
to update its concept of recent global xmin going forward.

You might be best off grabbing a list of txid-xmin when you start and sorting
them by xmin so you can loop through them sleeping until you reach the first
txid with an xmin large enough to continue.

 Reversing the process is simpler, since we only have to turn off the
 flag in pg_class:

I'm not sure how this interacts with:

 Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only
 tables will be ignored, since they are effectively already there. So we
 don't need to change the internals of the locking, nor edit the RI code
 to remove the call to SHARE lock referenced tables. Do this during
 post-parse analysis.

Since queries which think they hold FOR SHARE tuple locks will be magically
losing their share locks if you turn off the read-only flag. Do you need to
obtain an exclusive lock on the table to turn it read-write?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote:
 Features
 - Read Only Tables
 - Compressed Tablespaces

I wonder if instead of read-only tables wouldn't it be better to have
some kind of automatic partitioning which permits to have different
chunks of the table data in different tablespaces, and a freeze command
which effectively moves the data from the (normally small) active chunk
to the archive chunk when it's transaction id is older than a predefined
threshold ?

Then put the active chunk on a high performance file system and the
archive tablespace on a compressed/slow/cheap file system and you're
done. Allow even the archive chunk to be updateable, and put new tuple
data in the active chunk. It would work just fine for cases where the
old data is rarely updated/deleted...

Another advantage I guess would be that active data would more likely
stay in cache, as updated records would stay together and not spread
over the inactive.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote:
 Another advantage I guess would be that active data would more likely
 stay in cache, as updated records would stay together and not spread
 over the inactive.

And I forgot to mention that vacuum could mostly skip the archive part,
and only vacuum the active part, which would drastically reduce the cost
of vacuuming big  active tables.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Hannu Krosing

Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy:
 On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote:
  Features
  - Read Only Tables
  - Compressed Tablespaces
 
 I wonder if instead of read-only tables wouldn't it be better to have
 some kind of automatic partitioning which permits to have different
 chunks of the table data in different tablespaces, and a freeze command
 which effectively moves the data from the (normally small) active chunk
 to the archive chunk when it's transaction id is older than a predefined
 threshold ?

This would be doable using Simons proposed commands.

 Then put the active chunk on a high performance file system and the
 archive tablespace on a compressed/slow/cheap file system and you're
 done. Allow even the archive chunk to be updateable, and put new tuple
 data in the active chunk. It would work just fine for cases where the
 old data is rarely updated/deleted...

You can't update a table on a read-only (write-once) partition, at least
not with current header structure.

 Another advantage I guess would be that active data would more likely
 stay in cache, as updated records would stay together and not spread
 over the inactive.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy:
  Then put the active chunk on a high performance file system and the
  archive tablespace on a compressed/slow/cheap file system and you're
  done. Allow even the archive chunk to be updateable, and put new tuple
  data in the active chunk. It would work just fine for cases where the
  old data is rarely updated/deleted...
 
 You can't update a table on a read-only (write-once) partition, at least
 not with current header structure.

OK, but that's what I'm challenging, why do you need a write once
partition ? You mean by that tapes ? OK, it means I was thinking in
completely different usage scenarios then...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Trevor Talbot
On 12/11/07, Simon Riggs [EMAIL PROTECTED] wrote:

 Compressed Tablespaces

 Using a streaming library like zlib, it will be easy to read/write data
 files into a still-usable form but with much reduced size. Access to a
 compressed table only makes sense as a SeqScan. That would be handled by
 introducing tablespace-specific access costs, discussed below. Indexes
 on compressed tables would still be allowed, but would hardly ever be
 used.

I've actually been wanting this lately, for a couple reasons. One is
reduced disk footprint, but the other is reduced I/O, similar to how
TOAST helps with large fields now. (In my particular scenario, TOAST
can't help due to small field sizes.) It would be useful to have
available even on read/write data. To that end, it would probably make
more sense to use a block compression algorithm rather than a
streaming one. Block-based algorithms can generally get better
compression than streaming ones as well, at least when fed large
enough blocks.

I'm not familiar with the implementation issues, other than the
obvious variable block sizes make the I/O subsystem look very
different, so I don't know if there's a major tradeoff between the
two strategies (even just for read-only).

 I'm open to arguments that we don't need this at all because filesystem
 utilities exist that do everything we need. You're experience will be
 good to hear about in regard to this feature.

Some filesystems do support transparent compression, but they're not
always available. It would be nice to have compression on
unsophisticated systems with cheap hardware.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Peter Childs
On 11/12/2007, Csaba Nagy [EMAIL PROTECTED] wrote:

 On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote:
  Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy:
   Then put the active chunk on a high performance file system and the
   archive tablespace on a compressed/slow/cheap file system and you're
   done. Allow even the archive chunk to be updateable, and put new tuple
   data in the active chunk. It would work just fine for cases where the
   old data is rarely updated/deleted...
 
  You can't update a table on a read-only (write-once) partition, at least
  not with current header structure.

 OK, but that's what I'm challenging, why do you need a write once
 partition ? You mean by that tapes ? OK, it means I was thinking in
 completely different usage scenarios then...

 Cheers,
 Csaba.



I think DVD or CD would make sence, Tapes have an added limitation of being
sequential access only.

Peter Childs


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Peter Childs
On 11/12/2007, Simon Riggs [EMAIL PROTECTED] wrote:

 Attach
 --

 Writing tables on one system and then moving that data to other systems
 is fairly common. If we supported read-only tables then you might
 consider how you would publish new versions to people.

 For now, we just want to consider how we will upgrade from one release
 to another without needing to unload and reload potentially many
 Terabytes of data. We can't delete the old data until the new data is
 successfully loaded, so we will have a huge temporary storage cost. This
 could go very slowly if we use cheaper storage, plus reloading the data
 means we have to re-freeze it again also.

 So we need a way of attaching the old tables to the new database. We
 might call this binary upgrade, or we might be slightly less ambitious
 and talk about just moving the old read-only data. That's all I want to
 do at this stage.

 I'm mentioning this here now to see what comes out in debate, and what
 others are planning to work on in this area.


This sounds like allowing new versions to read old versions file structure,
Probably I guess on a tablespace by table space basis

Another advantage might be to be able to load an old pitr backup in a new
version across major versions. But I'm not sure that would work.

Maybe we need a command to upgrade a tablespace to a new versions file
format?

Need to be careful we don't stunt future progress by fixing file format
much.

Peter Childs


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 11:49 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
 
  So... VACUUM FREEZE table SET READ ONLY;
 
  would be my first thought, but I'm guessing everybody will press me
  towards supporting the more obvious
 
  ALTER TABLE table SET READ ONLY;
 
  This command will place a ShareLock (only) on the table, preventing
  anybody from writing to the table while we freeze it. The ShareLock is
  incompatible with any transaction that has written to the table, so when
  we acquire the lock all writers to the table will have completed. We
  then run the equivalent of a VACUUM FREEZE which will then be able to
  freeze *all* rows in one pass (rather than all except the most recent).
  On completion of the freeze pass we will then update the pg_class entry
  to show that it is now read-only, so we will emulate the way VACUUM does
  this.
 
 To be clear it if it meets a block for which a tuple is not freezable -- that
 is, it has an xmin or xmax more recent than the global xmin then it needs to
 block waiting for the backend which that recent xmin. Then presumably it needs
 to update its concept of recent global xmin going forward.
 
 You might be best off grabbing a list of txid-xmin when you start and sorting
 them by xmin so you can loop through them sleeping until you reach the first
 txid with an xmin large enough to continue.

D'oh. Completely agreed. Mia culpa.

I had that bit in my original design, but I was looking elsewhere on
this clearly. I'd been trying to think about how to do this since about
2 years ago and it was only the CREATE INDEX CONCURRENTLY stuff that
showed me how. Thanks for nudging me.

  Reversing the process is simpler, since we only have to turn off the
  flag in pg_class:
 
 I'm not sure how this interacts with:
 
  Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only
  tables will be ignored, since they are effectively already there. So we
  don't need to change the internals of the locking, nor edit the RI code
  to remove the call to SHARE lock referenced tables. Do this during
  post-parse analysis.
 
 Since queries which think they hold FOR SHARE tuple locks will be magically
 losing their share locks if you turn off the read-only flag. Do you need to
 obtain an exclusive lock on the table to turn it read-write?

Agreed. I wasn't suggesting implementing without, just noting that it
might have been possible, but it seems not as you say. I don't think its
important to be able to do that with less than AccessExclusiveLock.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 This command will place a ShareLock (only) on the table, preventing
 anybody from writing to the table while we freeze it. The ShareLock is
 incompatible with any transaction that has written to the table, so when
 we acquire the lock all writers to the table will have completed. We
 then run the equivalent of a VACUUM FREEZE which will then be able to
 freeze *all* rows in one pass (rather than all except the most
 recent).

This breaks MVCC.  The fact that a transaction has completed is not
license to discard tuple xmin immediately.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 10:19 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  This command will place a ShareLock (only) on the table, preventing
  anybody from writing to the table while we freeze it. The ShareLock is
  incompatible with any transaction that has written to the table, so when
  we acquire the lock all writers to the table will have completed. We
  then run the equivalent of a VACUUM FREEZE which will then be able to
  freeze *all* rows in one pass (rather than all except the most
  recent).
 
 This breaks MVCC.  The fact that a transaction has completed is not
 license to discard tuple xmin immediately.

Yeh, agreed. I knew I'd solved that bit, so I was focused elsewhere.
Sloppy, so apologies.

I was originally planning to put a wait in at the beginning, as is used
by CREATE INDEX CONCURRENTLY, though I prefer Greg's variant because
it's more forgiving.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote:
 On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote:
  Features
  - Read Only Tables
  - Compressed Tablespaces
 
 I wonder if instead of read-only tables wouldn't it be better to have
 some kind of automatic partitioning 

That's definitely on my list of requirements for partitioning.

 which permits to have different
 chunks of the table data in different tablespaces, and a freeze command
 which effectively moves the data from the (normally small) active chunk
 to the archive chunk when it's transaction id is older than a predefined
 threshold ?

As Hannu says, this is exactly what the other features will allow, so
Yes!

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Andrew Sullivan
On Tue, Dec 11, 2007 at 11:12:46AM +, Simon Riggs wrote:
 
 Read-Only Tables
 

In the past when this topic came up, there was some discussion of doing this
at a level somewhere below the table horizon.  There are a number of nasty
limitations for partitions currently (not the least of which is that real
uniqueness guarantees are impractical), so allowing users to specify some
segment of the table to be read only without imposing it on the whole
table would be awful nice.  I seem to recall Jan had an idea on how to do
it, but I could be wrong.

Also, doing this at the tuple, rather than table-wide, level might lead to
additional capabilities in this area:

 Attach
 --
 
 Writing tables on one system and then moving that data to other systems
 is fairly common. If we supported read-only tables then you might
 consider how you would publish new versions to people.

Some time ago I was speculating on pie-in-the-sky features I might like in
Postgres, and it was something like this attach.  But the idea was somehow
related to the read-only tuples.

In my specific case, I have piles and piles of mostly useless data.
Sometimes, however, some of that data is possibly useful in retrospect.  So
the suggestion was to have tables that could be mostly offline -- archived
somewhere -- but for which we had enough metadata online to say, You have
some data that might match in catalog C.  Go mount it, and I'll check.  I
think this is subtly different from the attach case you're outlining?

A

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 14:25 -0500, Andrew Sullivan wrote:
 On Tue, Dec 11, 2007 at 11:12:46AM +, Simon Riggs wrote:
  
  Read-Only Tables
  
 
 In the past when this topic came up, there was some discussion of doing this
 at a level somewhere below the table horizon.  There are a number of nasty
 limitations for partitions currently (not the least of which is that real
 uniqueness guarantees are impractical), so allowing users to specify some
 segment of the table to be read only without imposing it on the whole
 table would be awful nice.  I seem to recall Jan had an idea on how to do
 it, but I could be wrong.
 
 Also, doing this at the tuple, rather than table-wide, level might lead to
 additional capabilities in this area:

Seems fair comment.

I'll move forward my thoughts on partitioning, so we can decide whether
to do things this way, or below the table horizon as you say. I've got
some detailed notes on this already which showed it was roughly balanced
between the two ways. I'll write it up so we can see; the answer might
change during the writing.

  Attach
  --
  
  Writing tables on one system and then moving that data to other systems
  is fairly common. If we supported read-only tables then you might
  consider how you would publish new versions to people.
 
 Some time ago I was speculating on pie-in-the-sky features I might like in
 Postgres, and it was something like this attach.  But the idea was somehow
 related to the read-only tuples.
 
 In my specific case, I have piles and piles of mostly useless data.
 Sometimes, however, some of that data is possibly useful in retrospect.  So
 the suggestion was to have tables that could be mostly offline -- archived
 somewhere -- but for which we had enough metadata online to say, You have
 some data that might match in catalog C.  Go mount it, and I'll check.  I
 think this is subtly different from the attach case you're outlining?

Yes it is, but I had hoped that what you're asking for is catered for
here.

If you have a hierarchical storage manager, you can just call access the
file, whereupon the actual file will be de-archived to allow access. Or
maybe you have it on MAID storage, so we spin up the disks to allow
access to the files. So I guess I was expecting the de-archive to be
automated at the file system level.
http://en.wikipedia.org/wiki/Hierarchical_Storage_Management
Regrettably, I'm not aware of an open source HSM, though XFS has some
hooks in it that mention this. That's an old IBM term, so some people
might refer to this concept as tiered storage. 

Since I was planning to modify smgr to allow different kinds of
tablespaces it should be possible to make the file issue some kind of a
call out to mount the appropriate files. What would that call out look
like? Would that be a DMAPI/XDSM impelementation, or something simpler
as was used for PITR, or a roll-your-own plug-in hook?

I can see I'll have to re-wire smgr_nblocks() for non-md smgrs to access
pg_class.relpages rather than issue a seek, which will need to issue an
open. That way we can do planning without actually accessing the table.
(Maybe that's presuming we dealing with tables, oh well).

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly