Re: [GENERAL] Incremental backups, and backup history

2003-06-20 Thread Nigel J. Andrews
On Thu, 19 Jun 2003, Matthew Nuzum wrote:

> Regarding backup history:
> 
> I have an application designed for novices.  Apparently it's easy to hit the
> "Delete" button, and then say yes to the "Are you sure you want to delete
> this?" question even when they don't want to. Therefore I simply mark a
> record as deleted.  For example, 
> UPDATE table SET deleted='t' WHERE something=true;
> 
> Then my application logic pretends it doesn't really exist until two days
> later the user decides they want it back.
> 
> It works very well for me.
> 

But are you also taking care of the referential integrity issues, i.e. only
disallowing tuples with a deleted = true from being referenced to and ensuring
nothing references them at the time they are marked as deleted.

It is a useful idea but as I know from a current project it requires
reimplementing foreign key functionality. In this case the middleware only uses
functions, one per statement, and nothing else, so I have been able to do much
of this in those functions but it's still a pain. I even wrote a utility to
take some of the leg work out of generating and maintaining quite a few
functions but if I'd had time [and thought about these basically being foreign
key constraints] I'd have looked at the existing foreign key code and seen if I
could copy and amend it or just amend it in place.


--
Nigel Andrews



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Incremental backups, and backup history

2003-06-19 Thread Matthew Nuzum
Regarding backup history:

I have an application designed for novices.  Apparently it's easy to hit the
"Delete" button, and then say yes to the "Are you sure you want to delete
this?" question even when they don't want to. Therefore I simply mark a
record as deleted.  For example, 
UPDATE table SET deleted='t' WHERE something=true;

Then my application logic pretends it doesn't really exist until two days
later the user decides they want it back.

It works very well for me.

--
Matthew Nuzum
www.bearfruit.org
[EMAIL PROTECTED]
 

> -Original Message-
> From: Bruno Wolff III [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 19, 2003 6:59 AM
> To: Antonios Christofides
> Cc: [EMAIL PROTECTED]
> Subject: Re: Incremental backups, and backup history
> 
> On Thu, Jun 19, 2003 at 11:42:28 +0300,
>   Antonios Christofides <[EMAIL PROTECTED]> wrote:
> >
> > My second question is a general relational database backup question, not
> > specifically related to pgsql. Sometimes a user accidentally
> > deletes/corrupts a file, and discovers it three days later. After they
> > come panicing to me, I can give them their file as it was three days
> > ago, because of the backup tape rotation. Now suppose a user deletes
> > ten employees from the database, and three days later they understand
> > that this was a bad thing. Now what? I can restore the entire database
> > and make it as it was three days ago, but I can't restore the particular
> > deleted records in the current database, as the relations make the
> > database behave as a single unit.
> 
> This isn't a good situation to be in. If you log all of the transactions,
> one option may be to restore the database as of the last time it was
> valid, remove the bogus transactions from the transaction log and then
> replay
> the transactions from the time of the backup to the present.
> This will mean downtime for the database and it is possible that removing
> the bogus tranasctions will affect things as you reapply other
> transactions
> in ways you don't want.
> 
> Another option is to figure out what the bogus tranasctions did (which
> keeping
> a history will help out with) and try to undo them as best as possible
> with
> new transactions.
> 
> Either of the above are going to need manual oversight.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Incremental backups, and backup history

2003-06-19 Thread Dennis Gearon
On your second question:

Keeping old data helps with data analysis, i.e., data mining. I would do the fired date as transactions. To see if an employee is still and employee, look for the latest transation, hired, rehired, contracted with as a temp/consultant, fired, laid off, etc.

Antonios Christofides wrote:

Hi,

I have two backup questions, not much related to each other; here they
are.
First: With PostgreSQL, I can't do incremental backups. pg_dump will
dump the entire database. Thus, if I want to keep daily backups on tape,
I'm in trouble because I'll have to do a full backup every day, which
may need several hours and several tapes.
One workaround I'm thinking is to not store BLOBs in the database, but
store them in the filesystem and store the filenames in the database
instead.  This needs some additional work (for example, a garbage
collector to periodically delete unreferenced files), but will move a
large amount of space from the database into the filesystem, which is
capable of incremental backups. Only BLOBs, that is; for some tables
that will have several tens of millions of small rows, I can't think of
any workaround.
Is this filenames-instead-of-BLOBs for easier backup common practice?
Any other ideas or comments?
My second question is a general relational database backup question, not
specifically related to pgsql. Sometimes a user accidentally
deletes/corrupts a file, and discovers it three days later. After they
come panicing to me, I can give them their file as it was three days
ago, because of the backup tape rotation. Now suppose a user deletes
ten employees from the database, and three days later they understand
that this was a bad thing. Now what? I can restore the entire database
and make it as it was three days ago, but I can't restore the particular
deleted records in the current database, as the relations make the
database behave as a single unit.
A colleague suggested, instead of updating or deleting rows, to only
insert rows with a timestamp; for example, instead of updating the
employee row with id=55, you insert a new row with id=55 with the
updated data, and change the primary key to (id, dateinserted). You then
always select the row with max dateinserted. A garbage collector is also
needed to periodically delete obsolete rows older than, say, six months.
Improvements can be made (such as using dateobsoleted instead of
dateinserted or moving the old rows to another table), but even in the
simplest cases I think it will be extremely hard to implement such a
system, again because of the relations.
So, it is a matter of database design? Do I have to design the database
so that it keeps the history of what happened?
Thanks everyone for the answers.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] Incremental backups, and backup history

2003-06-19 Thread Greg Stark

Antonios Christofides <[EMAIL PROTECTED]> writes:

> Is this filenames-instead-of-BLOBs for easier backup common practice?
> Any other ideas or comments?

This is a major point of contention. Some people think keeping all data in the
database is a better approach, others think data that isn't inherently
relational and doesn't need the protection of transactions doesn't really
belong in the database.

I happen to be firmly in the camp against putting such files in the database.
But it depends a lot on what your needs are. In every case I've faced this it
was simply useful to have the files accessible without piping them through the
database protocol every time. I could point the web server at the directory
and serve them up directly, or rsync them to the machines that could do that.
They could even be served up from a separate lightweight web server without
any database access altogether, which would have been impossible if the only
way to access them was via the database.

If you need to be able to update portions of your blobs, or if you need
transactional safety then you may need the database.

> My second question is a general relational database backup question, not
> specifically related to pgsql. Sometimes a user accidentally
> deletes/corrupts a file, and discovers it three days later. After they
> come panicing to me, I can give them their file as it was three days
> ago, because of the backup tape rotation. Now suppose a user deletes
> ten employees from the database, and three days later they understand
> that this was a bad thing. Now what? I can restore the entire database
> and make it as it was three days ago, but I can't restore the particular
> deleted records in the current database, as the relations make the
> database behave as a single unit.

What's worse is you may have several related changes to multiple tables. And
then you won't know if any other changes to other tables were later done that
depended on that data. There will be no way to do this perfectly in general.

If your data is particularly amenable to this form of normalization then it
can be useful though. For example, instead of storing counters that are
incremented, even when the total is the only interesting statistic, I normally
insert new records for every event. If ever it turns out something was wrong
and the events should be disregarded for a particular time period, or with
particular other properties, I have the complete history and can do that.
Inserts are also more efficient for the database to do than updates.

But for something like an employee table you're probably going to be stuck
with restoring the table to a new name, and having a human look over the old
data and the current data and updating the current data appropriately.

You may want to consider having a "deleted" flag column and not ever deleting
records. So at least undelete can be an easy operation that doesn't even need
restoring backups.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]