I am wondering the feasibility of having PG continue to work even if non-essential indexes are gone or corrupt. I brought this basic concept up at some point in the past, but now I have a different motivation, so I want to strike up discussion about it again. This time around, I simply don't want to back up indexes if I don't have to. Because indexes contain essentially redundant data, losing one does not equate to losing real data. Therefore, backing them up represents a lot of overhead for very little benefit.

Here's the basic idea:

1) New field to pg_index (indvalid boolean).
2) Query planner skips indexes where indvalid = false.
3) Executer does not update indexes where indvalid = false.
4) Executer refuses insert or update to unique columns where indvalid = false, throwing an error. 5) WAL roll forward marks indvalid = false if index file(s) are missing, rather than panicking. 6) REINDEX recognizes syntax to only build indexes with indvalid = false, marks indvalid = true.

Close to 25% of the on disk bulk of my database is index files. It would save a significant amount of the system resources used during the backup, if I didn't have to archive the index files. In the unlikely event that a restore/roll forward becomes necessary, I could simply issue something like "REINDEX DATABASE foo INVALID;" to restore all the missing indexes and return the database to full function. Prior to a reindex, the database would perform poorly and refuse to do certain inserts and updates, but the data would be available. Backup files would be smaller, and the restore/roll forward would be faster.

No down sides jump out at me, and it seems to me that for a regular PG code hacker this could actually be fairly simple to implement.

Any chance of something like this being done in the future?


-Glen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to