Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2016-05-31 Thread Hendrik Visage
Hi there,

 Refering to https://www.postgresql.org/message-id/1352742344.21373.4@mofo


 I'm running into situations where I'd need to bulk transfer of data
tables across servers, but a drop and recreate schema isn't feasible
as we are running different permissions etc. on the two databases.

Thus my question: Anybody working on getting this into pg_restore
proper, and any advice on getting this feature incorporated?

HEndrik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-12-04 Thread Josh Kupershmidt
Sorry for the delay in following up here.

On Mon, Nov 26, 2012 at 8:30 PM, Karl O. Pinc k...@meme.com wrote:
 On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:
 On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas robertmh...@gmail.com
 wrote:
  On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc k...@meme.com wrote:
  P.S.  An outstanding question regards --truncate-tables
  is whether it should drop indexes before truncate
  and re-create them after restore.  Sounds like it should.
 
  Well, that would improve performance, but it also makes the
 behavior
  of object significantly different from what one might expect from
 the
  name.  One of the problems here is that there seem to be a number
 of
  slightly-different things that one might want to do, and it's not
  exactly clear what all of them are, or whether a reasonable number
 of
  options can cater to all of them.

 Another problem: attempting to drop a unique constraint or primary
 key
 (if we're counting these as indexes to be dropped and recreated,
 which
 they should be if the goal is reasonable restore performance) which
 is
 referenced by another table's foreign key will cause:
   ERROR:  cannot drop constraint xxx on table yyy
   because other objects depend on it

 and as discussed upthread, it would be impolite for pg_restore to
 presume it should monkey with dropping+recreating other tables'
 constraints to work around this problem, not to mention impossible
 when pg_restore is not connected to the target database.

 I'm thinking impossible because it's impossible to know
 what the existing FKs are without a db connection.  Impossible is
 a problem.  You may have another reason why it's impossible.

Yes, that's what I meant.

 Meanwhile it sounds like the --truncate-tables patch
 is looking less and less desirable.  I'm ready for
 rejection, but will soldier on in the interest of
 not wasting other people work on this, if given
 direction to move forward.

Well, as far as I was able to tell, the use-case where this patch
worked without trouble was limited to restoring a table, or schema
with table(s), that:
 a.) has some view(s) dependent on it
 b.) has no other tables with FK references to it, so that we don't run into:
ERROR:  cannot truncate a table referenced in a foreign key constraint
 c.) is not so large that it takes forever for data to be restored
with indexes and constraints left intact
 d.) and whose admin does not want to use --clean plus a list-file
which limits pg_restore to the table and its views

I was initially hoping that the patch would be more useful for
restoring a table with FKs pointing to it, but it seems the only
reliable way to do this kind of selective restore with pg_restore is
with --clean and editing the list-file. Editing the list-file is
certainly tedious and prone to manual error, but I'm not sure this
particular patch has a wide enough use-case to alleviate that pain
significantly.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-12-04 Thread Karl O. Pinc
On 12/04/2012 09:26:47 PM, Josh Kupershmidt wrote:
 Sorry for the delay in following up here.

No problem at all.

 Well, as far as I was able to tell, the use-case where this patch
 worked without trouble was limited to restoring a table, or schema
 with table(s), that:
  a.) has some view(s) dependent on it
  b.) has no other tables with FK references to it, so that we don't
 run into:
 ERROR:  cannot truncate a table referenced in a foreign key 
 constraint
  c.) is not so large that it takes forever for data to be restored
 with indexes and constraints left intact
  d.) and whose admin does not want to use --clean plus a list-file
 which limits pg_restore to the table and its views
 
 I was initially hoping that the patch would be more useful for
 restoring a table with FKs pointing to it, but it seems the only
 reliable way to do this kind of selective restore with pg_restore is
 with --clean and editing the list-file. Editing the list-file is
 certainly tedious and prone to manual error, but I'm not sure this
 particular patch has a wide enough use-case to alleviate that pain
 significantly.

I think there must be a reliable way to restore tables with FKs 
pointing to them, but getting pg_restore to do it seems perilous; at 
least given your expectations for the behavior of pg_restore in the
context of the existing option set.

As with you, I am not inclined to add another option to pg_restore
unless it's really useful.  (Pg_restore already has gobs of options, 
to the point where it's getting sticky.)  I don't think this 
patch meets the utility bar.  It might be able to be re-worked into 
something useful, or it might need to evolve into some sort of new 
restore utility per your thoughts.  For now better to reject it until
the right/comprehensive way to proceed is figured out.

Thanks for all your work.  I hope that this has at least
moved the discussion forward and not been a waste of everybody's
time.  I would like to see a better way of restoring
tables that are FK reference targets.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-26 Thread Robert Haas
On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt schmi...@gmail.com wrote:
 TBH, I didn't find the example above particularly compelling for
 demonstrating the need for this feature. If you've just got one table
 with dependent views which needs to be restored, it's pretty easy to
 manually TRUNCATE and have pg_restore --data-only reload the table.
 (And easy enough to combine the truncate and restore into a single
 transaction in case anything goes wrong, if need be.)

 But I'm willing to grant that this proposed feature is potentially as
 useful as existing restore-jiggering options like --disable-triggers.
 And I guess I could see that if you're really stuck having to perform
 a --data-only restore of many tables, this feature could come in
 handy.

I think I would come down on the other side of this.  We've never
really been able to get --clean work properly in all scenarios, and it
seems likely that a similar fate will befall this option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-26 Thread Karl O. Pinc
On 11/26/2012 12:06:56 PM, Robert Haas wrote:
 On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt
 schmi...@gmail.com wrote:
  TBH, I didn't find the example above particularly compelling for
  demonstrating the need for this feature. If you've just got one
 table
  with dependent views which needs to be restored, it's pretty easy 
 to
  manually TRUNCATE and have pg_restore --data-only reload the table.
  (And easy enough to combine the truncate and restore into a single
  transaction in case anything goes wrong, if need be.)
 
  But I'm willing to grant that this proposed feature is potentially
 as
  useful as existing restore-jiggering options like
 --disable-triggers.
  And I guess I could see that if you're really stuck having to
 perform
  a --data-only restore of many tables, this feature could come in
  handy.
 
 I think I would come down on the other side of this.  We've never
 really been able to get --clean work properly in all scenarios, and 
 it
 seems likely that a similar fate will befall this option.


Where I would like to go with this is to first introduce,
as a new patch, an ALTER TABLE option to disable a 
constraint.  Something like

  ALTER TABLE foo UNVALIDATE CONSTRAINT constraintname;

This would mark the constraint NOT VALID, as if the
constraint were created with the NOT VALID option.
After a constraint is UNVALIDATEd the existing 

  ALTER TABLE foo VALIDATE CONSTRAINT constraintname;

feature would turn the constraint on and check the data.

With UNVALIDATE CONSTRAINT, pg_restore could first turn
off all the constraints concerning tables to be restored,
truncate the tables, restore the data, turn the
constraints back on and re-validate the constraints.
No need to worry about ordering based on a FK referential
dependency graph or loops in such a graph (due to
DEFERRABLE INITIALLY DEFERRED).

This approach would allow the content of a table or
tables to be restored regardless of dependent objects
or FK references and preserve FK referential integrity.
Right?  I need some guidance here from someone who
knows more than I do.

There would likely need to be a pg_restore option like
--disable-constraints to invoke this functionality,
but that can be worked out later.
Likewise, I see an update and a delete trigger in
pg_triggers associated with the referenced table
in REFERENCES constraints, but no trigger for
truncate.  So making a constraint NOT VALID may
not be as easy as it seems.

I don't know what the problems are with --clean
but I would like to know if this appears 
a promising approach.  If so I can pursue it,
although I make no promises.  (I sent in
the --disable-triggers patch because it seemed
easy and I'm not sure where a larger project fits
into my life.)

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein

P.S.  An outstanding question regards --truncate-tables
is whether it should drop indexes before truncate
and re-create them after restore.  Sounds like it should.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-26 Thread Robert Haas
On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc k...@meme.com wrote:
 Where I would like to go with this is to first introduce,
 as a new patch, an ALTER TABLE option to disable a
 constraint.  Something like

   ALTER TABLE foo UNVALIDATE CONSTRAINT constraintname;

This doesn't really make sense, because constraints that are not
validated are still enforced for new rows.  This thus wouldn't save
anything performance-wise.  We should perhaps have two more states:
not enforced but blindly assumed true, and not enforced and not
assumed true either.  But currently, we don't.

 I don't know what the problems are with --clean
 but I would like to know if this appears
 a promising approach.  If so I can pursue it,
 although I make no promises.  (I sent in
 the --disable-triggers patch because it seemed
 easy and I'm not sure where a larger project fits
 into my life.)

I'm not really sure what the issues were any more; but I think they
may have had to do with dependencies between different objects messing
things up, which I think is likely to be a problem for this proposal
as well.

 P.S.  An outstanding question regards --truncate-tables
 is whether it should drop indexes before truncate
 and re-create them after restore.  Sounds like it should.

Well, that would improve performance, but it also makes the behavior
of object significantly different from what one might expect from the
name.  One of the problems here is that there seem to be a number of
slightly-different things that one might want to do, and it's not
exactly clear what all of them are, or whether a reasonable number of
options can cater to all of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-26 Thread Josh Kupershmidt
On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc k...@meme.com wrote:
 P.S.  An outstanding question regards --truncate-tables
 is whether it should drop indexes before truncate
 and re-create them after restore.  Sounds like it should.

 Well, that would improve performance, but it also makes the behavior
 of object significantly different from what one might expect from the
 name.  One of the problems here is that there seem to be a number of
 slightly-different things that one might want to do, and it's not
 exactly clear what all of them are, or whether a reasonable number of
 options can cater to all of them.

Another problem: attempting to drop a unique constraint or primary key
(if we're counting these as indexes to be dropped and recreated, which
they should be if the goal is reasonable restore performance) which is
referenced by another table's foreign key will cause:
  ERROR:  cannot drop constraint xxx on table yyy
  because other objects depend on it

and as discussed upthread, it would be impolite for pg_restore to
presume it should monkey with dropping+recreating other tables'
constraints to work around this problem, not to mention impossible
when pg_restore is not connected to the target database.

It is a common administrative task to selectively restore some
existing tables' contents from a backup, and IIRC was the impetus for
this patch. Instead of adding a bunch of options to pg_restore,
perhaps a separate tool specific to this task would be the way to go.
It could handle the minutiae of truncating, dropping and recreating
constraints and indexes of the target tables, and dealing with FKs
sensibly, without worrying about conflicts with existing pg_restore
options and behavior.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-26 Thread Karl O. Pinc
On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:
 On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas robertmh...@gmail.com
 wrote:
  On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc k...@meme.com wrote:
  P.S.  An outstanding question regards --truncate-tables
  is whether it should drop indexes before truncate
  and re-create them after restore.  Sounds like it should.
 
  Well, that would improve performance, but it also makes the 
 behavior
  of object significantly different from what one might expect from
 the
  name.  One of the problems here is that there seem to be a number 
 of
  slightly-different things that one might want to do, and it's not
  exactly clear what all of them are, or whether a reasonable number
 of
  options can cater to all of them.
 
 Another problem: attempting to drop a unique constraint or primary 
 key
 (if we're counting these as indexes to be dropped and recreated, 
 which
 they should be if the goal is reasonable restore performance) which 
 is
 referenced by another table's foreign key will cause:
   ERROR:  cannot drop constraint xxx on table yyy
   because other objects depend on it
 
 and as discussed upthread, it would be impolite for pg_restore to
 presume it should monkey with dropping+recreating other tables'
 constraints to work around this problem, not to mention impossible
 when pg_restore is not connected to the target database.

I'm thinking impossible because it's impossible to know
what the existing FKs are without a db connection.  Impossible is 
a problem.  You may have another reason why it's impossible.

 It is a common administrative task to selectively restore some
 existing tables' contents from a backup, and IIRC was the impetus for
 this patch.

Yes.  (And aside from listing tables individually it'd be nice
to restore tables per schema.)

It's also a bit surprising that restoring table content
is so hard/unsupported, given a db of more than minimal
complexity.

 Instead of adding a bunch of options to pg_restore,
 perhaps a separate tool specific to this task would be the way to go.
 It could handle the minutiae of truncating, dropping and recreating
 constraints and indexes of the target tables, and dealing with FKs
 sensibly, without worrying about conflicts with existing pg_restore
 options and behavior.

Per above, the tool would then either require a db connection
or at least a dump which contains the system catalogs.

I'm afraid I don't have a clear picture of what such a tool
would look like, if it does not look a lot like pg_restore.
I would like to have such a tool.  I'm not certain how
much I'd be able to contribute toward making one.

Meanwhile it sounds like the --truncate-tables patch
is looking less and less desirable.  I'm ready for
rejection, but will soldier on in the interest of
not wasting other people work on this, if given
direction to move forward.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-26 Thread Karl O. Pinc
On 11/26/2012 09:30:48 PM, Karl O. Pinc wrote:
 On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:

  It is a common administrative task to selectively restore some
  existing tables' contents from a backup, and IIRC was the impetus
 for
  this patch.
 
 Yes.  (And aside from listing tables individually it'd be nice
 to restore tables per schema.)

As long as I'm daydreaming it'd be nice to be able to
restore a table, data and schema, and have available
the various combinations of: new table name, different
owner, different schema, different db.  Without having
to edit a file by hand.

Of course I've not done the brain work involved in
figuring out just what this would mean in terms
of related objects like triggers, constraints,
indexes and so forth.  But then who doesn't want
a pony?  :-)

Regards,

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-23 Thread Josh Kupershmidt
On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc k...@meme.com wrote:

 On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc k...@meme.com wrote:
  On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

 OT:
 After looking at the code I found a number of  conflicting
 option combinations are not tested for or rejected.   I can't
 recall what they are now.  The right way to fix these would be
 to send in a separate patch for each conflict all attached
 to one email/commitfest item?  Or one patch that just gets
 adjusted until it's right?

Typically I think it's easiest for the reviewer+committer to consider
a bunch of such similar changes altogether in one patch, rather than
in a handful of separate patches, though that's just my own
preference.

  
 
  More serious objections were raised regarding semantics.
 
  What if, instead, the initial structure looked like:
 
  CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);
 
  CREATE TABLE schemaB.bar
(id INT CONSTRAINT bar_on_foo REFERENCES foo
   , moredata INT);
 
  With a case like this, in most real-world situations, you'd
  have to use pg_restore with --disable-triggers if you wanted
  to use --data-only and --truncate-tables.  The possibility of
  foreign key referential integrity corruption is obvious.

 Why would --disable-triggers be used in this example? I don't think
 you could use --truncate-tables to restore only table foo, because
 you would get:

   ERROR:  cannot truncate a table referenced in a foreign key
 constraint

 (At least, I think so, not having tried with the actual patch.) You
 could use --disable-triggers when restoring bar.

 I tried it and you're quite right.  (I thought I'd tried this
 before, but clearly I did not -- proving the utility of the review
 process.  :-)  My assumption was that since triggers
 were turned off that constraints, being triggers, would be off as
 well and therefore tables with foreign keys could be truncated.
 Obviously not, since the I get the above error.

 I just tried it.  --disable-triggers does not disable constraints.

Just to be clear, I believe the problem in this example is that
--disable-triggers does not disable any foreign key constraints of
other tables when you are restoring a single table. So with:

  pg_restore -1 --truncate-tables --disable-triggers --table=foo \
  --data-only my.dump ...

you would get the complaint

  ERROR:  cannot truncate a table referenced in a foreign key constraint

which is talking about bar's referencing foo, and there was no

  ALTER TABLE bar DISABLE TRIGGER ALL

done, since bar isn't being restored. I don't have a quibble with
this existing behavior -- you are instructing pg_restore to only mess
with bar, after all. See below, though, for a comparison of --clean
and --truncate-tables when restoring foo and bar together.

 For your first example, --truncate-tables seems to have some use, so
 that the admin isn't forced to recreate various views which may be
 dependent on the table. (Though it's not too difficult to work around
 this case today.)

 As an aside: I never have an issue with this, having planned ahead.
 I'm curious what the not-too-difficult work-around is that you have
 in mind.  I'm not coming up with a tidy way to, e.g, pull a lot
 of views out of a dump.

Well, for the first example, there was one table and only one view
which depended on the table, so manually editing the list file like
so:

  pg_restore --list --file=my.dump  output.list
  # manually edit file output.list, select only entries pertaining
  # to the table and dependent view
  pg_restore -1 --clean --use-list=output.list ...

isn't too arduous, though it would become so if you had more dependent
views to worry about.

I'm willing to count this use-case as a usability win for
--truncate-tables, since with that option the restore can be boiled
down to a short and sweet:

  pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ...

Though note this may not prove practical for large tables, since
--data-only leaves constraints and indexes intact during the restore,
and can be massively slower compared to adding the constraints only
after the data has been COPYed in, as pg_restore otherwise does.

 For the second example involving FKs, I'm a little bit more hesitant
 about  endorsing the use of --truncate-tables combined with
 --disable-triggers to potentially allow bogus FKs. I know this is
 possible to some extent today using the --disable-triggers option,
 but
 it makes me nervous to be adding a mode to pg_restore if it's
 primarily designed to work together with --disable-triggers as a
 larger foot-gun.

 This is the crux of the issue, and where I was thinking of
 taking this patch.  I very often am of the mindset that
 foreign keys are no more or less special than other
 more complex data integrity rules enforced with triggers.
 (I suppose others might say the same regards to integrity
 enforced at the application level.)  This naturally
 inclines me to think that 

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-21 Thread Karl O. Pinc
Hi Josh,

On 11/20/2012 11:53:23 PM, Josh Kupershmidt wrote:
 Hi Karl,
 I signed on to review this patch for the current CF.

I noticed.  Thanks very much.


 On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc k...@meme.com wrote:
  On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

  First, the problem:
 
  Begin with the following structure:
 
  CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);
 
  CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;
 
  Then, by accident, somebody does:
 
  UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;
 
  So, you want to restore the data into schemaA.foo.
  But schemaA.foo has (bad) data in it that must first
  be removed. It would seem that using
 
pg_restore --clean -n schemaA -t foo my_pg_dump_backup
 
  would solve the problem, it would drop schemaA.foo,
  recreate it, and then restore the data.  But this does
  not work.  schemaA.foo does not drop because it's
  got a dependent database object, schemaB.bar.

 TBH, I didn't find the example above particularly compelling for
 demonstrating the need for this feature. If you've just got one table
 with dependent views which needs to be restored, it's pretty easy to
 manually TRUNCATE and have pg_restore --data-only reload the table.
 (And easy enough to combine the truncate and restore into a single
 transaction in case anything goes wrong, if need be.)

I was not clear in stating the problem.  (But see below
regarding foreign keys.)  The dependent view
was an example, but there can also be REFERENCES constraints and
trigger related constraints involving other tables in other schemas.
The easiest work-around I can think of here is destroying all the
triggers and constraints, either everything in the whole db
or doing some work to be more selective, truncating all the schema's
tables. doing a data-only restore of the
schema, and then pg_restore --data-only, and then re-creating
all the triggers and constraints.

 
 But I'm willing to grant that this proposed feature is potentially as
 useful as existing restore-jiggering options like --disable-triggers.
 And I guess I could see that if you're really stuck having to perform
 a --data-only restore of many tables, this feature could come in
 handy.

I think so.  See above.

 
  So, the idea here is to be able to do a data-only
  restore, first truncating the data in the tables
  being restored to remove the existing corrupted data.
 
  The proposal is to add a --truncate-tables option
  to pg_restore.
 
  

  (I tested pg_restore with 9.1 and when --data-only is
  used --clean is ignored, it does not even produce a warning.
  This is arguably a bug.)
 
 +1 for having pg_restore bail out with an error if the user specifies
 --data-only --clean. By the same token, --clean and --truncate-tables
 together should also raise a not allowed error.

OT:
After looking at the code I found a number of  conflicting
option combinations are not tested for or rejected.   I can't
recall what they are now.  The right way to fix these would be
to send in a separate patch for each conflict all attached
to one email/commitfest item?  Or one patch that just gets
adjusted until it's right?

 
  
 
  More serious objections were raised regarding semantics.
 
  What if, instead, the initial structure looked like:
 
  CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);
 
  CREATE TABLE schemaB.bar
(id INT CONSTRAINT bar_on_foo REFERENCES foo
   , moredata INT);
 
  With a case like this, in most real-world situations, you'd
  have to use pg_restore with --disable-triggers if you wanted
  to use --data-only and --truncate-tables.  The possibility of
  foreign key referential integrity corruption is obvious.
 
 Why would --disable-triggers be used in this example? I don't think
 you could use --truncate-tables to restore only table foo, because
 you would get:
 
   ERROR:  cannot truncate a table referenced in a foreign key
 constraint
 
 (At least, I think so, not having tried with the actual patch.) You
 could use --disable-triggers when restoring bar. 

I tried it and you're quite right.  (I thought I'd tried this
before, but clearly I did not -- proving the utility of the review
process.  :-)  My assumption was that since triggers
were turned off that constraints, being triggers, would be off as 
well and therefore tables with foreign keys could be truncated.
Obviously not, since the I get the above error.

I just tried it.  --disable-triggers does not disable constraints.

 
   Recovering some data and being left with referential
  integrity problems is better than having no data.
 
 Well, this is really a judgment call, and I have a hunch that many
 admins would actually choose none of the above. And I think this
 point gets to the crux of whether the --truncate-tables option will 
 be
 useful as-is.

Well, yes.  None of the above is best.  :)  In my case I had munged
the data in the one important schema and wanted to restore.  The 
setting is an academic one 

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-20 Thread Josh Kupershmidt
Hi Karl,
I signed on to review this patch for the current CF. Most of the
background for the patch seems to be in the message below, so I'm
going to respond to this one first.

On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc k...@meme.com wrote:
 On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

 I've had problems using pg_restore --data-only when
 restoring individual schemas (which contain data which
 has had bad things done to it).  --clean does not work
 well because of dependent objects in other schemas.

OK.

 

 First, the problem:

 Begin with the following structure:

 CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);

 CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;

 Then, by accident, somebody does:

 UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;

 So, you want to restore the data into schemaA.foo.
 But schemaA.foo has (bad) data in it that must first
 be removed. It would seem that using

   pg_restore --clean -n schemaA -t foo my_pg_dump_backup

 would solve the problem, it would drop schemaA.foo,
 recreate it, and then restore the data.  But this does
 not work.  schemaA.foo does not drop because it's
 got a dependent database object, schemaB.bar.

Right.

 Of course there are manual work-arounds.  One of these
 is truncating schemaA.foo and then doing a pg_restore
 with --data-only.

Simply doing TRUNCATE manually was the first thought that occurred to
me when I saw your example.

 The manual work-arounds become
 increasingly burdensome as you need to restore more
 tables.  The case that motivated me was an attempt
 to restore the data in an entire schema, one which
 contained a significant number of tables.

TBH, I didn't find the example above particularly compelling for
demonstrating the need for this feature. If you've just got one table
with dependent views which needs to be restored, it's pretty easy to
manually TRUNCATE and have pg_restore --data-only reload the table.
(And easy enough to combine the truncate and restore into a single
transaction in case anything goes wrong, if need be.)

But I'm willing to grant that this proposed feature is potentially as
useful as existing restore-jiggering options like --disable-triggers.
And I guess I could see that if you're really stuck having to perform
a --data-only restore of many tables, this feature could come in
handy.

 So, the idea here is to be able to do a data-only
 restore, first truncating the data in the tables
 being restored to remove the existing corrupted data.

 The proposal is to add a --truncate-tables option
 to pg_restore.

 

 There were some comments on syntax.

 I proposed to use -u as a short option.  This was
 thought confusing, given it's use in other
 Unix command line programs (mysql).   Since there's
 no obvious short option, forget it.  Just have
 a long option.

Agreed.

 Another choice is to avoid introducing yet another
 option and instead overload --clean so that when
 doing a --data-only restore --clean truncates tables
 and otherwise --clean retains the existing behavior of
 dropping and re-creating the restored objects.

I like the --truncate-tables flag idea better than overloading
--clean, since it makes the purpose immediately apparent.

 (I tested pg_restore with 9.1 and when --data-only is
 used --clean is ignored, it does not even produce a warning.
 This is arguably a bug.)

+1 for having pg_restore bail out with an error if the user specifies
--data-only --clean. By the same token, --clean and --truncate-tables
together should also raise a not allowed error.

 

 More serious objections were raised regarding semantics.

 What if, instead, the initial structure looked like:

 CREATE TABLE schemaA.foo
   (id PRIMARY KEY, data INT);

 CREATE TABLE schemaB.bar
   (id INT CONSTRAINT bar_on_foo REFERENCES foo
  , moredata INT);

 With a case like this, in most real-world situations, you'd
 have to use pg_restore with --disable-triggers if you wanted
 to use --data-only and --truncate-tables.  The possibility of
 foreign key referential integrity corruption is obvious.

Why would --disable-triggers be used in this example? I don't think
you could use --truncate-tables to restore only table foo, because
you would get:

  ERROR:  cannot truncate a table referenced in a foreign key constraint

(At least, I think so, not having tried with the actual patch.) You
could use --disable-triggers when restoring bar. Though if you're
just enabling that option for performance purposes, and are unable to
guarantee that the restore will leave the tables in a consistent
state, well then it seems like you shouldn't use the option.

 Aside:  Unless you're restoring databases in their entirety
 the pg_restore --disable-triggers option makes it easy to
 introduce foreign key referential integrity corruption.

Yup, and I think the docs could do more to warn users about
--disable-triggers in particular. And I see you've submitted a
separate patch along those lines.

 In fact, since pg_restore 

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-12 Thread Karl O. Pinc
Hi,

Attached is version 4.  Version 3 no longer
built against head.

On 10/16/2012 09:48:06 PM, Karl O. Pinc wrote:
 
 On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote:
 
  On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:
 
   On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
 On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
 
  I've had problems using pg_restore --data-only when
  restoring individual schemas (which contain data which
  has had bad things done to it).  --clean does not work
  well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch.  Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.
 
 
 Karl k...@meme.com
 Free Software:  You don't pay back, you pay forward.
  -- Robert A. Heinlein
 
 

--quoted attachment--
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 




Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 1b9db6b..23b0d16 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -546,6 +546,26 @@
  /varlistentry
 
  varlistentry
+  termoption--truncate-tables//term
+  listitem
+   para
+This option is only relevant when performing a data-only
+restore.  It instructs applicationpg_restore/application
+to execute commands to truncate the target tables while the
+data is reloaded.  Use this when restoring tables or schemas
+and option--clean/option cannot be used because dependent
+objects would be destroyed.
+   /para
+
+   para
+ The option--disable-triggers/option will almost always
+ always need to be used in conjunction with this option to
+ disable check constraints on foreign keys.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption--use-set-session-authorization/option/term
   listitem
para
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 		 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+		 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1fead28..c7fdc79 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -309,6 +309,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt-createDB  ropt-single_txn)
 		exit_horribly(modulename, -C and -1 are incompatible options\n);
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt-dropSchema  ropt-truncate_tables)
+		exit_horribly(modulename, -c and --truncate-tables are incompatible options\n);
+
 	/*
 	 * If we're going to do parallel restore, there are some restrictions.
 	 */
@@ -403,6 +408,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt-dataOnly  ropt-truncate_tables)
+		exit_horribly(modulename, --truncate-tables requires the --data-only option\n);
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -562,6 +571,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH-currentTE = te;
 
@@ -696,14 +706,21 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 		 * server, so no need to see if we should issue BEGIN.
 		 */
 		StartTransaction(AH);
+		truncate = 1;
+	} else
+		/* Truncate the table when asked to. */
+		truncate = ropt-truncate_tables;
 
+	if (truncate) {
 		/*
 		 * If the server version is = 8.4, make sure we issue
 		 * TRUNCATE with ONLY so that child tables are not
-		 * wiped.
+		 * wiped.  If we don't know the server version
+		 * then err on the side of safety.
 		 */
 		ahprintf(AH, TRUNCATE TABLE %s%s;\n\n,
- 

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-10-16 Thread Karl O. Pinc
Hi,

Attached is version 3.

The convention seems to be to leave the operator at the
end of the line when breaking long lines, so do that.
Add extra () -- make operator precedence explicit and
have indentation reflect operator precedence.

On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote:

 On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:

  On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
   On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

 I've had problems using pg_restore --data-only when
 restoring individual schemas (which contain data which
 has had bad things done to it).  --clean does not work
 well because of dependent objects in other schemas.
   
   Since there wasn't much more to do I've gone ahead
   and written the patch.  Works for me.
   
   Against git master.
   Passes regression tests, but there's no regression
   tests for pg_restore so this does not say much.
   Since there's no regression tests I've not written one.
   
   Since this is a real patch for application I've given
   it a new name (it's not a v2).
   
   Truncate done right before COPY, since that's what
   the parallel restores do.


Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..488d8dc 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,26 @@
  /varlistentry
 
  varlistentry
+  termoption--truncate-tables//term
+  listitem
+   para
+This option is only relevant when performing a data-only
+restore.  It instructs applicationpg_restore/application
+to execute commands to truncate the target tables while the
+data is reloaded.  Use this when restoring tables or schemas
+and option--clean/option cannot be used because dependent
+objects would be destroyed.
+   /para
+
+   para
+ The option--disable-triggers/option will almost always
+ always need to be used in conjunction with this option to
+ disable check constraints on foreign keys.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption--use-set-session-authorization/option/term
   listitem
para
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 		 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+		 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 722b3e9..43b5806 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -311,6 +311,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt-createDB  ropt-dropSchema)
 		exit_horribly(modulename, -C and -c are incompatible options\n);
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt-dropSchema  ropt-truncate_tables)
+		exit_horribly(modulename, -c and --truncate-tables are incompatible options\n);
+
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
@@ -412,6 +417,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt-dataOnly  ropt-truncate_tables)
+		exit_horribly(modulename, --truncate-tables requires the --data-only option\n);
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -553,6 +562,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH-currentTE = te;
 
@@ -687,15 +697,22 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 		 * server, so no need to see if we should issue BEGIN.
 		 */
 		StartTransaction(AH);
+		truncate = 1;
+	} else
+		/* Truncate the table when asked to. */
+		truncate = ropt-truncate_tables;
 
+	if (truncate) {
 		/*
 		 * If the server version is = 8.4, make sure we issue
 		 * TRUNCATE with ONLY so that child tables are not
-		 * wiped.
+		 * wiped.  If we don't know the server version
+		 * then err on the side of safety.
 		 */
 		ahprintf(AH, TRUNCATE TABLE %s%s;\n\n,
- (PQserverVersion(AH-connection) = 80400 ?
-  ONLY  : ),
+ ((!AH-connection ||
+   PQserverVersion(AH-connection) = 80400) ?
+  ONLY  : ),
  fmtId(te-tag));
 	}
 

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-09-23 Thread Karl O. Pinc
Attached is version 2.  The sgml did not build.

On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:
 Whoops.  Do over.  Sent the wrong file.
 
 On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
  On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
   On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
   
I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it).  --clean does not work
well because of dependent objects in other schemas.
  
  Since there wasn't much more to do I've gone ahead
  and written the patch.  Works for me.
  
  Against git master.
  Passes regression tests, but there's no regression
  tests for pg_restore so this does not say much.
  Since there's no regression tests I've not written one.
  
  Since this is a real patch for application I've given
  it a new name (it's not a v2).
  
  Truncate done right before COPY, since that's what
  the parallel restores do.
 
 
 Karl k...@meme.com
 Free Software:  You don't pay back, you pay forward.
  -- Robert A. Heinlein
 
 

--quoted attachment--
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 




Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..488d8dc 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,26 @@
  /varlistentry
 
  varlistentry
+  termoption--truncate-tables//term
+  listitem
+   para
+This option is only relevant when performing a data-only
+restore.  It instructs applicationpg_restore/application
+to execute commands to truncate the target tables while the
+data is reloaded.  Use this when restoring tables or schemas
+and option--clean/option cannot be used because dependent
+objects would be destroyed.
+   /para
+
+   para
+ The option--disable-triggers/option will almost always
+ always need to be used in conjunction with this option to
+ disable check constraints on foreign keys.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption--use-set-session-authorization/option/term
   listitem
para
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 		 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+		 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 722b3e9..43b5806 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -311,6 +311,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt-createDB  ropt-dropSchema)
 		exit_horribly(modulename, -C and -c are incompatible options\n);
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt-dropSchema  ropt-truncate_tables)
+		exit_horribly(modulename, -c and --truncate-tables are incompatible options\n);
+
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
@@ -412,6 +417,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt-dataOnly  ropt-truncate_tables)
+		exit_horribly(modulename, --truncate-tables requires the --data-only option\n);
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -553,6 +562,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH-currentTE = te;
 
@@ -687,15 +697,22 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 		 * server, so no need to see if we should issue BEGIN.
 		 */
 		StartTransaction(AH);
+		truncate = 1;
+	} else
+		/* Truncate the table when asked to. */
+		truncate = ropt-truncate_tables;
 
+	if (truncate) {
 		/*
 		 * If the server version is = 8.4, make sure we issue
 		 * TRUNCATE with ONLY so that child tables are not
-		 * wiped.
+		 * wiped.  If we don't know the server version
+		 * then err on the side of safety.
 		 */
 		ahprintf(AH, TRUNCATE TABLE %s%s;\n\n,
- (PQserverVersion(AH-connection) = 80400 ?
-  ONLY  : ),
+ (!AH-connection
+  || 

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-09-22 Thread Karl O. Pinc
On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
 On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
 
  I've had problems using pg_restore --data-only when
  restoring individual schemas (which contain data which
  has had bad things done to it).  --clean does not work
  well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch.  Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-09-22 Thread Karl O. Pinc
Whoops.  Do over.  Sent the wrong file.

On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
 On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
  On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
  
   I've had problems using pg_restore --data-only when
   restoring individual schemas (which contain data which
   has had bad things done to it).  --clean does not work
   well because of dependent objects in other schemas.
 
 Since there wasn't much more to do I've gone ahead
 and written the patch.  Works for me.
 
 Against git master.
 Passes regression tests, but there's no regression
 tests for pg_restore so this does not say much.
 Since there's no regression tests I've not written one.
 
 Since this is a real patch for application I've given
 it a new name (it's not a v2).
 
 Truncate done right before COPY, since that's what
 the parallel restores do.


Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..11cba8e 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,25 @@
  /varlistentry
 
  varlistentry
+  termoption--truncate-tables//term
+  listitem
+   para
+This option is only relevant when performing a data-only
+restore.  It instructs applicationpg_restore/application
+to execute commands to truncate the target tables while the
+data is reloaded.  Use this when restoring tables or schemas
+and option--clean/clean cannot be used because dependent
+objects would be destroyed.
+   /para
+
+   para
+ The option--disable-triggers/option will almost always
+ always need to be used in conjunction with this option to
+ disable check constraints on foreign keys.
+   /para
+ /varlistentry
+
+ varlistentry
   termoption--use-set-session-authorization/option/term
   listitem
para
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 		 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+		 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 722b3e9..43b5806 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -311,6 +311,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt-createDB  ropt-dropSchema)
 		exit_horribly(modulename, -C and -c are incompatible options\n);
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt-dropSchema  ropt-truncate_tables)
+		exit_horribly(modulename, -c and --truncate-tables are incompatible options\n);
+
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
@@ -412,6 +417,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt-dataOnly  ropt-truncate_tables)
+		exit_horribly(modulename, --truncate-tables requires the --data-only option\n);
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -553,6 +562,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH-currentTE = te;
 
@@ -687,15 +697,22 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 		 * server, so no need to see if we should issue BEGIN.
 		 */
 		StartTransaction(AH);
+		truncate = 1;
+	} else
+		/* Truncate the table when asked to. */
+		truncate = ropt-truncate_tables;
 
+	if (truncate) {
 		/*
 		 * If the server version is = 8.4, make sure we issue
 		 * TRUNCATE with ONLY so that child tables are not
-		 * wiped.
+		 * wiped.  If we don't know the server version
+		 * then err on the side of safety.
 		 */
 		ahprintf(AH, TRUNCATE TABLE %s%s;\n\n,
- (PQserverVersion(AH-connection) = 80400 ?
-  ONLY  : ),
+ (!AH-connection
+  || PQserverVersion(AH-connection)
+	 = 80400 ? ONLY  : ),
  fmtId(te-tag));
 	}
 
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f6c835b..c0b0bfc 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -77,6 +77,7 @@ main(int argc, char **argv)
 	static int	disable_triggers = 0;
 	static int	no_data_for_failed_tables = 0;
 	static int	outputNoTablespaces = 0;
+	static int	

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-09-21 Thread Karl O. Pinc
On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

 I've had problems using pg_restore --data-only when
 restoring individual schemas (which contain data which
 has had bad things done to it).  --clean does not work
 well because of dependent objects in other schemas.

Before doing any more work I want to report on the
discussions that took place at the code sprint at
Postgres Open in Chicago.  Because I'm going to add
in additional thoughts I've had and to avoid mis-representing
anybody's opinion I'll not mention who said what.
Feel free to step forward and claim Ingenious Ideas
as your own.  Likewise I apologize if lack of attribution
makes it more difficult to discern (my) uninformed drivel
from intelligent insight.



First, the problem:

Begin with the following structure:

CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);

CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;

Then, by accident, somebody does:

UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;

So, you want to restore the data into schemaA.foo.
But schemaA.foo has (bad) data in it that must first
be removed.  It would seem that using 

  pg_restore --clean -n schemaA -t foo my_pg_dump_backup

would solve the problem, it would drop schemaA.foo,
recreate it, and then restore the data.  But this does
not work.  schemaA.foo does not drop because it's
got a dependent database object, schemaB.bar.

Of course there are manual work-arounds.  One of these
is truncating schemaA.foo and then doing a pg_restore
with --data-only.  The manual work-arounds become
increasingly burdensome as you need to restore more
tables.  The case that motivated me was an attempt
to restore the data in an entire schema, one which
contained a significant number of tables.

So, the idea here is to be able to do a data-only
restore, first truncating the data in the tables
being restored to remove the existing corrupted data.

The proposal is to add a --truncate-tables option
to pg_restore.



There were some comments on syntax.

I proposed to use -u as a short option.  This was
thought confusing, given it's use in other
Unix command line programs (mysql).   Since there's
no obvious short option, forget it.  Just have
a long option.

Another choice is to avoid introducing yet another
option and instead overload --clean so that when
doing a --data-only restore --clean truncates tables
and otherwise --clean retains the existing behavior of
dropping and re-creating the restored objects.

(I tested pg_restore with 9.1 and when --data-only is
used --clean is ignored, it does not even produce a warning.
This is arguably a bug.)



More serious objections were raised regarding semantics.

What if, instead, the initial structure looked like:

CREATE TABLE schemaA.foo
  (id PRIMARY KEY, data INT);

CREATE TABLE schemaB.bar
  (id INT CONSTRAINT bar_on_foo REFERENCES foo
 , moredata INT);

With a case like this, in most real-world situations, you'd
have to use pg_restore with --disable-triggers if you wanted
to use --data-only and --truncate-tables.  The possibility of
foreign key referential integrity corruption is obvious.

Aside:  Unless you're restoring databases in their entirety
the pg_restore --disable-triggers option makes it easy to
introduce foreign key referential integrity corruption.
In fact, since pg_restore does not wrap it's operations
in one big transaction, it's easy to attempt restoration
of a portion of a database, have part of the process
succeed and part of it fail (due to either schema
or data dependencies), and be left off worse
than before you started.  The pg_restore docs might
benefit from a big fat warning regarding
attempts to restore less than an entire database.

So, the discussion went, pg_restore is just another
application and introducing more options
which could lead to corruption of referential integrity is
a bad idea.

But pg_restore should not be thought of as just another
front-end.  It should be thought of as a data recovery
tool.  Recovering some data and being left with referential
integrity problems is better than having no data.  This
is true even if, due to different users owning different
schemas and so forth, nobody knows exactly what
might be broken.

Yes, but we can do better.  (The unstated sub-text being that
we don't want to introduce an inferior feature which
will then need to be supported forever.)

How could we do better:

Here I will record only the ideas related to restore,
although there was some mention of dump as well.

There has apparently been some discussion of writing
a foreign data wrapper which would operate on a database
dump.  This might (in ways that are not immediately
obvious to me) address this issue.

The restore process could, based on what table data needs
restoration, look at foreign key dependencies and produce a
list of the tables which all must be restored into order to
ensure foreign key referential integrity.  In the case of
restoration into a empty database the foreign key

[HACKERS] Suggestion for --truncate-tables to pg_restore

2012-09-20 Thread Karl O. Pinc
Hi,

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it).  --clean does not work
well because of dependent objects in other schemas.

Patch to the docs attached (before I go and do any
real coding.)


Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..94d359d 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,26 @@
  /varlistentry
 
  varlistentry
+  termoption-u//term
+  termoption--truncate-tables//term
+  listitem
+   para
+This option is only relevant when performing a data-only
+restore.  It instructs applicationpg_restore/application
+to execute commands to truncate the target tables while the
+data is reloaded.  Use this when restoring tables or schemas
+and option--clean/clean cannot be used because dependent
+objects would be destroyed.
+   /para
+
+   para
+ The option--disable-triggers/option will almost always
+ always need to be used in conjunction with this option to
+ disable check constraints on foreign keys.
+   /para
+ /varlistentry
+
+ varlistentry
   termoption--use-set-session-authorization/option/term
   listitem
para


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers