[GENERAL] I don't want to back up index files

2009-03-10 Thread Glen Parker

I am wondering the feasibility of having PG continue to work even if
non-essential indexes are gone or corrupt.  I brought this basic concept
up at some point in the past, but now I have a different motivation, so
I want to strike up discussion about it again.  This time around, I
simply don't want to back up indexes if I don't have to.  Because
indexes contain essentially redundant data, losing one does not equate
to losing real data.  Therefore, backing them up represents a lot of
overhead for very little benefit.

Here's the basic idea:

1) New field to pg_index (indvalid boolean).
2) Query planner skips indexes where indvalid = false.
3) Executer does not update indexes where indvalid = false.
4) Executer refuses insert or update to unique columns where indvalid =
false, throwing an error.
5) WAL roll forward marks indvalid = false if index file(s) are missing,
rather than panicking.
6) REINDEX recognizes syntax to only build indexes with indvalid =
false, marks indvalid = true.

Close to 25% of the on disk bulk of my database is index files.  It
would save a significant amount of the system resources used during the
backup, if I didn't have to archive the index files.  In the unlikely
event that a restore/roll forward becomes necessary, I could simply
issue something like "REINDEX DATABASE foo INVALID;" to restore all the
missing indexes and return the database to full function.  Prior to a
reindex, the database would perform poorly and refuse to do certain
inserts and updates, but the data would be available.  Backup files
would be smaller, and the restore/roll forward would be faster.

No down sides jump out at me, and it seems to me that for a regular PG
code hacker this could actually be fairly simple to implement.

Any chance of something like this being done in the future?


-Glen



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] I don't want to back up index files

2009-03-10 Thread Glen Parker
I am wondering the feasibility of having PG continue to work even if 
non-essential indexes are gone or corrupt.  I brought this basic concept 
up at some point in the past, but now I have a different motivation, so 
I want to strike up discussion about it again.  This time around, I 
simply don't want to back up indexes if I don't have to.  Because 
indexes contain essentially redundant data, losing one does not equate 
to losing real data.  Therefore, backing them up represents a lot of 
overhead for very little benefit.


Here's the basic idea:

1) New field to pg_index (indvalid boolean).
2) Query planner skips indexes where indvalid = false.
3) Executer does not update indexes where indvalid = false.
4) Executer refuses insert or update to unique columns where indvalid = 
false, throwing an error.
5) WAL roll forward marks indvalid = false if index file(s) are missing, 
rather than panicking.
6) REINDEX recognizes syntax to only build indexes with indvalid = 
false, marks indvalid = true.


Close to 25% of the on disk bulk of my database is index files.  It 
would save a significant amount of the system resources used during the 
backup, if I didn't have to archive the index files.  In the unlikely 
event that a restore/roll forward becomes necessary, I could simply 
issue something like "REINDEX DATABASE foo INVALID;" to restore all the 
missing indexes and return the database to full function.  Prior to a 
reindex, the database would perform poorly and refuse to do certain 
inserts and updates, but the data would be available.  Backup files 
would be smaller, and the restore/roll forward would be faster.


No down sides jump out at me, and it seems to me that for a regular PG 
code hacker this could actually be fairly simple to implement.


Any chance of something like this being done in the future?


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Joshua D. Drake
On Tue, 2009-03-10 at 18:54 -0700, Glen Parker wrote:
> I am wondering the feasibility of having PG continue to work even if
> non-essential indexes are gone or corrupt.  I brought this basic concept
> up at some point in the past, but now I have a different motivation, so
> I want to strike up discussion about it again.  This time around, I
> simply don't want to back up indexes if I don't have to.  Because
> indexes contain essentially redundant data, losing one does not equate
> to losing real data.  Therefore, backing them up represents a lot of
> overhead for very little benefit.

Hello,

I am sorry but this seems very silly. If you don't want to back up
indexes use pg_dump. 

> 
> Any chance of something like this being done in the future?
> 

I am going to go out on a limb here and say, "no".

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Grzegorz Jaśkiewicz
On Wed, Mar 11, 2009 at 3:03 PM, Joshua D. Drake  wrote:

>> Any chance of something like this being done in the future?
>>
>
> I am going to go out on a limb here and say, "no".

That would probably be possible, by placing all indicies in a separate
directory in data, but
Well, that would create whole another dimension of mess and
requirements on restore.

So like JD said, if you don't want to dump indicies - just use pg_dump...



-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Glen Parker

Grzegorz Jaśkiewicz wrote:

On Wed, Mar 11, 2009 at 3:03 PM, Joshua D. Drake  wrote:


Any chance of something like this being done in the future?


I am going to go out on a limb here and say, "no".


That would probably be possible, by placing all indicies in a separate
directory in data, but


I could simply use the system catalog to determine which files to backup 
and which to ignore.  I suppose I would prefer to backup all unique 
indexes, and perhaps all system indexes.  It's the proliferation of 
large indexes that serve only to enhance select performance that I would 
want to ignore.



So like JD said, if you don't want to dump indicies - just use pg_dump...


If pg_dump were an acceptable backup tool, we wouldn't need PITR, would 
we?  We used pg_dump for years.  There's a very good reason we no longer 
do.  That suggestion is silly.



-Glen



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Joshua D. Drake
On Wed, 2009-03-11 at 14:25 -0700, Glen Parker wrote:
> Grzegorz Jaśkiewicz wrote:

> > So like JD said, if you don't want to dump indicies - just use pg_dump...
> 
> If pg_dump were an acceptable backup tool, we wouldn't need PITR, would 
> we?  We used pg_dump for years.  There's a very good reason we no longer 
> do.  That suggestion is silly.

*shrug* you can consider it silly. It doesn't change the outcome.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Scott Marlowe
On Wed, Mar 11, 2009 at 3:25 PM, Glen Parker  wrote:
> Grzegorz Jaśkiewicz wrote:
>> So like JD said, if you don't want to dump indicies - just use pg_dump...
>
> If pg_dump were an acceptable backup tool, we wouldn't need PITR, would we?
>  We used pg_dump for years.  There's a very good reason we no longer do.
>  That suggestion is silly.

pg_dump is a perfectly acceptable backup tool, as is PITR.  They have
different ways of operating based on what you need.  Trying to make
PITR act more like pg_dump seems kind of silly to me.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Glen Parker

Scott Marlowe wrote:


pg_dump is a perfectly acceptable backup tool, as is PITR.  They have
different ways of operating based on what you need.  Trying to make
PITR act more like pg_dump seems kind of silly to me.



pg_dump is not acceptable to us because of the potential to lose many 
hours of valuable data.  Why would pg_dump even be relevant to this 
discussion?  PITR offers a benefit that pg_dump does not, a benefit that 
we, and countless other organizations, obviously find useful. 
Suggesting that a person who's been managing PG in a commercial setting 
since version 6.4 should just use pg_dump as an alternative to PITR is, 
well, rather insulting.


That's two people now who have called the idea "silly" without even a 
hint of a supporting argument.  Why would it be "silly" to improve the 
performance of a highly valuable tool set without compromising its 
utility?  Am I missing something here?  That's certainly possible, but 
the idea didn't just hatch last night; I've put enough thought into this 
to have reason to believe it's more than just "silly".


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Alvaro Herrera
Glen Parker escribió:

> That's two people now who have called the idea "silly" without even a  
> hint of a supporting argument.  Why would it be "silly" to improve the  
> performance of a highly valuable tool set without compromising its  
> utility?  Am I missing something here?  That's certainly possible, but  
> the idea didn't just hatch last night; I've put enough thought into this  
> to have reason to believe it's more than just "silly".

FWIW I don't think this idea is silly at all.  It's so not-silly, in
fact, that we already have some access methods that do this if an index
cannot be recovered (I think at least GiST does it).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Tom Lane
Alvaro Herrera  writes:
> FWIW I don't think this idea is silly at all.  It's so not-silly, in
> fact, that we already have some access methods that do this if an index
> cannot be recovered (I think at least GiST does it).

Well, there's a difference between "rebuild the index when it can't be
recovered" and "lose the index anytime the system burps".  AFAICS what
Glen is proposing is to not WAL-log index changes, and with that any
crash no matter how minor would have to invalidate indexes.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Joshua D. Drake
On Wed, 2009-03-11 at 16:57 -0700, Glen Parker wrote:
> Scott Marlowe wrote:

> That's two people now who have called the idea "silly" without even a 
> hint of a supporting argument.  Why would it be "silly" to improve the 
> performance of a highly valuable tool set without compromising its 
> utility?  Am I missing something here?  That's certainly possible, but 
> the idea didn't just hatch last night; I've put enough thought into this 
> to have reason to believe it's more than just "silly".

O.k. a couple of things:

1. You could put all your indexes into a table space, this would allow
you to "try" different things with the indexes.

2. Even though my peer Alvaro doesn't think the idea is silly, I still
do and here is why. If you can invalidate the indexes you will have to
reindex (or recreate) to make them valid (as you mentioned). That is an
exclusive lock.

If your database has any level of concurrency the cost to
recreate/reindex those indexes right when you are attempting to get your
standby into production is going to be very high. Basically you are
trading 25% hard disk space for a longer, possibly excessively longer
outage. Hard disk space is so darn cheap that it doesn't seem to make
any sense.

Creating indexes concurrently is also out because while you are creating
those indexes your performance will tank because everything is
sequential scanning and there is a possibility that the concurrent
creation will fail.

Sincerely,

Joshua D. Drake


> 
> -Glen
> 
> 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Glen Parker

Tom Lane wrote:

Alvaro Herrera  writes:

FWIW I don't think this idea is silly at all.  It's so not-silly, in
fact, that we already have some access methods that do this if an index
cannot be recovered (I think at least GiST does it).


Well, there's a difference between "rebuild the index when it can't be
recovered" and "lose the index anytime the system burps".  AFAICS what
Glen is proposing is to not WAL-log index changes, and with that any
crash no matter how minor would have to invalidate indexes.


Nooo...!  This has nothing to do with WAL logging index changes.  What I 
propose would have no effect on an end user that continues to back up 
indexes.  It would give people the *option* to not back up indexes.


This is about disaster recovery and the backups required to recover 
sanely, not about hiccups that cause only the last handful of 
transactions to be redone.


Causing invalidation of indexes after a crash might be acceptable in 
some settings, but not in mine, and that isn't what I'm after.


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Glen Parker

Joshua D. Drake wrote:

1. You could put all your indexes into a table space, this would allow
you to "try" different things with the indexes.


Most of them are, but I still have to back them up in order to have a 
valid backup, because the PITR code would choke if any are missing.



2. Even though my peer Alvaro doesn't think the idea is silly, I still
do and here is why. If you can invalidate the indexes you will have to
reindex (or recreate) to make them valid (as you mentioned). That is an
exclusive lock.

If your database has any level of concurrency the cost to
recreate/reindex those indexes right when you are attempting to get your
standby into production is going to be very high. Basically you are
trading 25% hard disk space for a longer, possibly excessively longer
outage. Hard disk space is so darn cheap that it doesn't seem to make
any sense.


We have yet to recover from a PG disaster.  We back up every night, and 
never use the back ups for anything.  To me, it seems perfectly 
reasonable to get a quicker back up every night, with the remote 
possibility of ever having to pay the price for it.  And IMO, the price 
on recovery would be minor, if not negative.  The time to rebuild the 
indexes would be offset by the shorter time needed to restore the 
database cluster and allow the PITR stuff to work its magic.  And of 
course keep in mind, that while indexes are building, all the tables not 
being indexed are accessible.  Nothing is accessible during a PITR 
operation.  I would also be able to rebuild only the most important 
indexes and leave the system to operate with a few missing until things 
quiet down, and then finish reindexing.



-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Tom Lane
Glen Parker  writes:
> Tom Lane wrote:
>> ... AFAICS what
>> Glen is proposing is to not WAL-log index changes, and with that any
>> crash no matter how minor would have to invalidate indexes.

> Nooo...!  This has nothing to do with WAL logging index changes.

How so?  In any PITR-based situation it seems to me you need to worry
about the WAL bulk a lot more than the bulk of the base backup.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Tom Lane
Glen Parker  writes:
> We have yet to recover from a PG disaster.  We back up every night, and 
> never use the back ups for anything.  To me, it seems perfectly 
> reasonable to get a quicker back up every night, with the remote 
> possibility of ever having to pay the price for it.

Why don't you just switch to a less frequent full-backup schedule?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Scott Marlowe
On Wed, Mar 11, 2009 at 5:57 PM, Glen Parker  wrote:
> Scott Marlowe wrote:
>
>> pg_dump is a perfectly acceptable backup tool, as is PITR.  They have
>> different ways of operating based on what you need.  Trying to make
>> PITR act more like pg_dump seems kind of silly to me.
>
> pg_dump is not acceptable to us because of the potential to lose many hours
> of valuable data.

That's why we use pg_dump and slony.  We lose no data without a giant
hosting center fire.

>  Why would pg_dump even be relevant to this discussion?

Because it's one method of backing up a database?

>  PITR offers a benefit that pg_dump does not, a benefit that we, and
> countless other organizations, obviously find useful.

And this benefit is?  Continuous backup I assume.  Something other
forms of replication can do, and which pg_dump then becomes a useful
backup tool.

> Suggesting that a
> person who's been managing PG in a commercial setting since version 6.4
> should just use pg_dump as an alternative to PITR is, well, rather
> insulting.

Darn, I've only been around since 6.5.2, and have about four years on
an airline reservation system.  I guess my opinions just don't count.
How could I possibly understand your advanced thought processes.

> That's two people now who have called the idea "silly" without even a hint
> of a supporting argument.  Why would it be "silly" to improve the
> performance of a highly valuable tool set without compromising its utility?

Because it's the size of the WAL files that kills most people, and not
putting the index updates into WAL files would be a hack I wouldn't
trust, and having them on the otherside but not adding them is just
wasting space?  Cause maybe, you didn't explain everything as clearly
as you could, and I made assumptions based on your incomplete
description?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-11 Thread Joshua D. Drake
On Wed, 2009-03-11 at 20:59 -0600, Scott Marlowe wrote:
> On Wed, Mar 11, 2009 at 5:57 PM, Glen Parker  wrote:
> > Scott Marlowe wrote:

> > Suggesting that a
> > person who's been managing PG in a commercial setting since version 6.4
> > should just use pg_dump as an alternative to PITR is, well, rather
> > insulting.
> 
> Darn, I've only been around since 6.5.2, and have about four years on
> an airline reservation system.  I guess my opinions just don't count.
> How could I possibly understand your advanced thought processes.
> 

Alright guys, let's not get into a my elephant trunk is bigger than
yours fight. 

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Alan Hodgson
On Wednesday 11 March 2009, Glen Parker  wrote:
> We have yet to recover from a PG disaster.  We back up every night, and
> never use the back ups for anything.  To me, it seems perfectly
> reasonable to get a quicker back up every night, with the remote
> possibility of ever having to pay the price for it.  

This isn't totally relevant, but as a way to speed up base backups, I keep a 
copy of the whole database rsync'd on the backup machine. The base backup 
consists of refreshing the rsync, and then tarring it offline (of course 
with the PITR backup commands in the sequence). My database is currently 
750GB on disk and gets many tens of millions of updates a day, and the 
rsync still runs in less than an hour per night.

I have done PITR recoveries (unfortunately). The log replay time dwarfs the 
time it takes to copy the index files back over (it is, honestly, slower 
than the original transactions were). Additionally, copying them is faster 
than rebuilding them would be. Also, I can't imagine bringing a database 
online without the majority of the indexes in place. The first dozen 
non-indexed queries against large tables would kill the machine; not only 
would you not be servicing users, but the rest of your restore would be 
slowed immensely.

-- 
Even a sixth-grader can figure out that you can’t borrow money to pay off 
your debt

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Glen Parker

Tom Lane wrote:

Glen Parker  writes:

Tom Lane wrote:

... AFAICS what
Glen is proposing is to not WAL-log index changes, and with that any
crash no matter how minor would have to invalidate indexes.



Nooo...!  This has nothing to do with WAL logging index changes.


How so?  In any PITR-based situation it seems to me you need to worry
about the WAL bulk a lot more than the bulk of the base backup.



It isn't the bulk so much as the amount of time, and the impact to the 
running system during that time, that it takes to execute the base backup.


I haven't noticed any real impact related to compressing and exporting 
WAL files.


Anyway, more to the point, I'm not knowingly proposing anything that 
should cause reduced system reliability in the event of a crash.


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Glen Parker

Tom Lane wrote:

Glen Parker  writes:
We have yet to recover from a PG disaster.  We back up every night, and 
never use the back ups for anything.  To me, it seems perfectly 
reasonable to get a quicker back up every night, with the remote 
possibility of ever having to pay the price for it.


Why don't you just switch to a less frequent full-backup schedule?



Paranoia.  Others in the organization have pushed to keep up with the 
nightly back ups, so that decision is mostly out of my hands.


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Joshua D. Drake
On Thu, 2009-03-12 at 12:59 -0700, Glen Parker wrote:
> Tom Lane wrote:
> > Glen Parker  writes:
> >> We have yet to recover from a PG disaster.  We back up every night, and 
> >> never use the back ups for anything.  To me, it seems perfectly 
> >> reasonable to get a quicker back up every night, with the remote 
> >> possibility of ever having to pay the price for it.
> > 
> > Why don't you just switch to a less frequent full-backup schedule?
> 
> 
> Paranoia.  Others in the organization have pushed to keep up with the 
> nightly back ups, so that decision is mostly out of my hands.

Why not just take a backup of the pitr slave instead? If you need to do
it nightly, shut down the standby process, tar, start standby process.
You never have to  bother the master at all.

Sincerely,

Joshua D. Drake


> 
> -Glen
> 
> 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Glen Parker

Scott Marlowe wrote:

That's two people now who have called the idea "silly" without even a hint
of a supporting argument.  Why would it be "silly" to improve the
performance of a highly valuable tool set without compromising its utility?


Because it's the size of the WAL files that kills most people, and not
putting the index updates into WAL files would be a hack I wouldn't
trust, and having them on the otherside but not adding them is just
wasting space?  Cause maybe, you didn't explain everything as clearly
as you could, and I made assumptions based on your incomplete
description?


I've re-read my original email and I just can't see how anybody got the 
idea I was suggesting to not WAL record index changes.  That would be 
insanity IMO.


So, to be clear, I'm not proposing any change to the way data is written 
to the WAL files.


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Alvaro Herrera
Glen Parker escribió:

> I've re-read my original email and I just can't see how anybody got the  
> idea I was suggesting to not WAL record index changes.

The subject says that.


> That would be  insanity IMO.

Agreed :-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Tom Lane
Glen Parker  writes:
> I've re-read my original email and I just can't see how anybody got the 
> idea I was suggesting to not WAL record index changes.

Mainly because the idea doesn't seem to make sense unless that's part
of the package.  If you don't cut index changes out of the WAL load
then the savings on the base backup alone aren't going to be all that
exciting when you consider the total cost of PITR backup.

Furthermore, you would need some very ugly hacks on the recovery process
to make it ignore (rather than try to apply) WAL records relating to
indexes.  I believe there are a fair number of cases where the recovery
process doesn't even know that a particular file is an index, because
the WAL stream doesn't tell it.  The live backends generating the WAL
log entries typically know that (and could suppress the entries) but the
recovery process has only a very limited view of reality.  It cannot,
for example, trust the system catalogs to be in a correct/consistent
state, so it couldn't look up the info for itself.

BTW, there's a related problem with the idea, which is that the
tools normally used to take base backups haven't got any way to
distinguish indexes from any other kind of relation.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Tom Lane
Alan Hodgson  writes:
> I have done PITR recoveries (unfortunately). The log replay time dwarfs the 
> time it takes to copy the index files back over (it is, honestly, slower 
> than the original transactions were).

Yeah :-( ... the problem is that recovery is serialized while the
original transactions might have run in parallel.  There is work
going on to improve that, but I don't know how much better it's
likely to get.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-12 Thread Glen Parker

Tom Lane wrote:

Glen Parker  writes:
Mainly because the idea doesn't seem to make sense unless that's part
of the package.  If you don't cut index changes out of the WAL load
then the savings on the base backup alone aren't going to be all that
exciting when you consider the total cost of PITR backup.


In our setting, I think it might be more exciting than you think.  As I 
said, I've not noticed any real impact to the system related to WAL 
exporting, but the nightly backup does indeed have a significant impact 
because of how long it runs.  WAL export is a couple seconds ever few 
minutes, which nobody ever notices.  The backup runs for a minimum of an 
hour and fifteen minutes, which people definitely notice.



Furthermore, you would need some very ugly hacks on the recovery process
to make it ignore (rather than try to apply) WAL records relating to
indexes.  I believe there are a fair number of cases where the recovery
process doesn't even know that a particular file is an index, because
the WAL stream doesn't tell it.  The live backends generating the WAL
log entries typically know that (and could suppress the entries) but the
recovery process has only a very limited view of reality.  It cannot,
for example, trust the system catalogs to be in a correct/consistent
state, so it couldn't look up the info for itself.


Could the live backends label the log entries with "hints" to be used by 
the replay process?  In this case, I would think a simple flag 
indicating whether replay is critical or not would suffice.



BTW, there's a related problem with the idea, which is that the
tools normally used to take base backups haven't got any way to
distinguish indexes from any other kind of relation.


Yes there's no doubt it would increase the complexity of the base 
backup, IF a person chooses to ignore indexes.  The up side is that 
people who are happy with the backup as it is would have to do nothing 
at all, it would just continue to work as it does now.  To ignore 
indexes (and only certain indexes at that), you'd have to examine the 
system catalog as part of each backup.  I already do that to some 
extent, in order to discover all the extra tablespaces that need to be 
backed up.


I guess the biggest problem I see with this is that it would have rather 
a small target audience.



-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-13 Thread Simon Riggs

On Thu, 2009-03-12 at 12:57 -0700, Glen Parker wrote: 
> Tom Lane wrote:
> > Glen Parker  writes:
> >> Tom Lane wrote:
> >>> ... AFAICS what
> >>> Glen is proposing is to not WAL-log index changes, and with that any
> >>> crash no matter how minor would have to invalidate indexes.
> > 
> >> Nooo...!  This has nothing to do with WAL logging index changes.
> > 
> > How so?  In any PITR-based situation it seems to me you need to worry
> > about the WAL bulk a lot more than the bulk of the base backup.
> 
> 
> It isn't the bulk so much as the amount of time, and the impact to the 
> running system during that time, that it takes to execute the base backup.
> 
> I haven't noticed any real impact related to compressing and exporting 
> WAL files.
> 
> Anyway, more to the point, I'm not knowingly proposing anything that 
> should cause reduced system reliability in the event of a crash.

Glen,

Thanks for bringing your ideas to the list.

The idea of auto rebuilding indexes following recovery has already been
proposed, so is under consideration. It hasn't been proposed in relation
to the use case you mention, so that is new.

If we did as you suggest then it would speed up the base backup but
would also add index rebuild time onto the end of any recovery. If the
database is so large than base backup time is significant then rebuild
time for *all* indexes will be very significant. It seems an attractive
proposition but one that could lead to significant regret if disaster
did strike.

So I think it's a good idea, but on balance not one that enough people
would use to make it worth implementing of itself. Auto rebuilding
damaged indexes is on the radar however, so if we're close enough we may
be able to do something useful along the lines you suggest.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-13 Thread Jeremy Harris

Simon Riggs wrote:

The idea of auto rebuilding indexes following recovery has already been
proposed, so is under consideration. It hasn't been proposed in relation
to the use case you mention, so that is new.

If we did as you suggest then it would speed up the base backup but
would also add index rebuild time onto the end of any recovery.


Forgive me for being stupid, but isn't a time when all the data for
a table is being streamed in, during restore, the *perfect* time
to build an index?   Why wait until after the restore?

Cheers,
   Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-13 Thread Glen Parker

Jeremy Harris wrote:

Simon Riggs wrote:

The idea of auto rebuilding indexes following recovery has already been
proposed, so is under consideration. It hasn't been proposed in relation
to the use case you mention, so that is new.

If we did as you suggest then it would speed up the base backup but
would also add index rebuild time onto the end of any recovery.


Forgive me for being stupid, but isn't a time when all the data for
a table is being streamed in, during restore, the *perfect* time
to build an index?   Why wait until after the restore?


Not all the data streams in.  Most of it is there before the WAL replay 
begins.


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-21 Thread Lew

Glen Parker wrote:
pg_dump is not acceptable to us because of the potential to lose many 
hours of valuable data.  Why would pg_dump even be relevant to this 
discussion?  PITR offers a benefit that pg_dump does not, a benefit that 
we, and countless other organizations, obviously find useful. Suggesting 
that a person who's been managing PG in a commercial setting since 
version 6.4 should just use pg_dump as an alternative to PITR is, well, 
rather insulting.


Dude, you were the one who started with calling other people's ideas "silly". 
  To be aggrieved when they respond in kind is, well, silly.


--
Lew

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general