Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-10 Thread Tom Lane
Greg Sabino Mullane [EMAIL PROTECTED] writes:
 Sure, but the question is whether that incremental gain in capability
 is worth the extra logical complexity.  I'm inclined to think that many
 more users would get burned by the complexity than would have use for
 it.

 I disagree - we lose a lot of flexibility by taking out the ordering,

We lose some flexibility, but it's not clear to me that it's so
essential as all that.  Even the restricted patch is tremendously
more flexible than pg_dump has ever been, and I just don't see the
argument that there's a market demand for doing more at the cost
of clarity.

 I'm also not sure why the regex
 should be changed to something even more non-standard than the current
 POSIX ones. Finally, I'm surprised at the apparent willingness at this
 point to shatter backwards-compatibility with previous -t scripts, as
 this was an option I raised early on but met strong resistance, thus
 the current compromise of allowing existing scripts to run unaltered,
 while adding in the ability to do some regular expressions.

That's a fair point, but the way that the patch was preserving exact
backward compatibility was by making it a discontinuous corner case,
which is a decision I think we'd regret in the long run.  Andrew was
already suggesting upthread that we drop the anchoring (and lose
compatibility to a much greater extent than what this does) in order
to make the behavior more self-consistent.  Also, insisting on straight
regexps amounts to failing to learn from experience: before 7.3 the psql
\d commands used patterns that *were* straight regexps, and that just
did not work all that conveniently.

 The regex stuff was discussed in January, and the patch submitted in
 July, so it seems a little rushed to be changing the underlying behavior
 so quickly right now

Well, the problem is that once we ship 8.2 we'll be stuck with whatever
behavior we've defined --- it's unlikely that it'd be worth the pain of
another round of incompatibility in order to make small adjustments.
So we'd better get it right the first time.  I do apologize for not
having reviewed this patch more closely earlier, but I've been a tad
busy...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Csaba Nagy
 We are two months past feature freeze ... adding entirely new features
 to pg_dump is *not* on the table for 8.2. 

Ok, clear.

  The scenario I most care about is to be able to make a complete data
  base dump (including non-schema objects) while excluding only a few
  tables.
 
 Isn't this the same as Kris' complaint?  Why do you need additional
 dependency analysis to do the above?

Well, I obviously didn't understand well the complete feature as it is
implemented. Now, is what I want (see above) possible with the new
feature, or if I exclude some tables I implicitly loose some other
things too from the dump which normally would be there ? This is my only
concern...

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes:
 Isn't this the same as Kris' complaint?  Why do you need additional
 dependency analysis to do the above?

 Well, I obviously didn't understand well the complete feature as it is
 implemented. Now, is what I want (see above) possible with the new
 feature, or if I exclude some tables I implicitly loose some other
 things too from the dump which normally would be there ? This is my only
 concern...

I think we've agreed that if you use some exclusion switches, but not
any inclusion switches, then only the specific objects matching your
switches are excluded.  CVS HEAD gets this wrong, but I'm going to work
on it today.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Csaba Nagy
On Mon, 2006-10-09 at 16:24, Tom Lane wrote:
 I think we've agreed that if you use some exclusion switches, but not
 any inclusion switches, then only the specific objects matching your
 switches are excluded.  CVS HEAD gets this wrong, but I'm going to work
 on it today.

Cool, that makes it cover my use case and some more.

Thanks,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Jim C. Nasby
On Sat, Oct 07, 2006 at 05:29:03PM -0400, Tom Lane wrote:
 So the only regex patterns you can't write directly are dot, R* and R?
 for which you can use these locutions:
 
   .   =  ?
   R*  =  (R+|)
   R?  =  (R|)
 
 (Perhaps this should be documented somewhere...)

Yes, please.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Jim C. Nasby
On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
 On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
  Tom Lane [EMAIL PROTECTED] writes:
  
   The existing patch's behavior is that the rightmost switch wins,
   ie, if an object's name matches more than one pattern then it is
   included or excluded according to the rightmost switch it matches.
   This is, erm, poorly documented, but it seems like useful behavior
   so I don't have an objection myself.
  
  I don't know, it sounds like it's the source of the confusion you
  identify later.
  
  My first thought is that the rule should be to apply all the
  inclusion switches (implicitly including everything if there are
  none), then apply all the exclusion switches.
 
 +1 :)
 
 Order-dependent switches are a giant foot gun.

They're also very powerful, as anyone who's ever used them in a
non-trivial rsync (or rdiff-backup) scenareo can tell you. What if you
want to exclude all of a schema except for a few objects (granted, right
now we're limited to just tables...)?

Here's a real example, from my rdiff-backup exclude files list:

+ /opt/local/var/db/dports
+ /opt/local/var/log
+ /opt/local/etc
/opt

Note that rdiffbackup applies the first option that matches a
file/directory, not the last. So this says to exclude all of /opt,
except for /opt/local/var/db/dports, etc. If this was done as 'last
operator takes priority', you'd just reverse the order of the list:

pg_dump --exclude-schema /opt --include-table opt.local_var_db_dports
--include-table opt.local_var_log --include-table opt.local_etc

If we processed all includes first, you lose this ability. For simple
cases, it might not matter, but think about wanting to back up an entire
database, except for schema opt, but you do want a few objects from
within opt. Without following some kind of ordering on includes and
excludes, that's next to impossible.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread David Fetter
On Mon, Oct 09, 2006 at 12:07:29PM -0500, Jim C. Nasby wrote:
 On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
  On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
   Tom Lane [EMAIL PROTECTED] writes:
   
The existing patch's behavior is that the rightmost switch wins,
ie, if an object's name matches more than one pattern then it is
included or excluded according to the rightmost switch it matches.
This is, erm, poorly documented, but it seems like useful behavior
so I don't have an objection myself.
   
   I don't know, it sounds like it's the source of the confusion you
   identify later.
   
   My first thought is that the rule should be to apply all the
   inclusion switches (implicitly including everything if there are
   none), then apply all the exclusion switches.
  
  +1 :)
  
  Order-dependent switches are a giant foot gun.
 
 They're also very powerful, as anyone who's ever used them in a
 non-trivial rsync (or rdiff-backup) scenareo can tell you. What if
 you want to exclude all of a schema except for a few objects
 (granted, right now we're limited to just tables...)?

You make an important distinction here, and thanks for doing that. :)

IMHO, order-dependent switches are appropriate for a configuration
file and inappropriate for the command line.  The pg_hba.conf file is
a great example of a place where order dependence is a good idea.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
 On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
 My first thought is that the rule should be to apply all the
 inclusion switches (implicitly including everything if there are
 none), then apply all the exclusion switches.
 
 +1 :)
 Order-dependent switches are a giant foot gun.

 They're also very powerful, as anyone who's ever used them in a
 non-trivial rsync (or rdiff-backup) scenareo can tell you.

Sure, but the question is whether that incremental gain in capability
is worth the extra logical complexity.  I'm inclined to think that many
more users would get burned by the complexity than would have use for it.
Considering that we've gotten along this long with only the most
primitive selection capabilities in pg_dump, it doesn't seem like
there's an enormous demand for highly refined capabilities.

(And I agree with David's comment that it might be better to reserve
such behavior for a configuration file than to put it on the command
line.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Jim C. Nasby
On Mon, Oct 09, 2006 at 01:59:18PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
  On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
  My first thought is that the rule should be to apply all the
  inclusion switches (implicitly including everything if there are
  none), then apply all the exclusion switches.
  
  +1 :)
  Order-dependent switches are a giant foot gun.
 
  They're also very powerful, as anyone who's ever used them in a
  non-trivial rsync (or rdiff-backup) scenareo can tell you.
 
 Sure, but the question is whether that incremental gain in capability
 is worth the extra logical complexity.  I'm inclined to think that many
 more users would get burned by the complexity than would have use for it.
 Considering that we've gotten along this long with only the most
 primitive selection capabilities in pg_dump, it doesn't seem like
 there's an enormous demand for highly refined capabilities.
 
 (And I agree with David's comment that it might be better to reserve
 such behavior for a configuration file than to put it on the command
 line.)

I can certainly see the logic in putting the more advanced capability in
a config file of some kind (though, I think a simple include/exclude
file is best for this...)

The question becomes: do we want incompatible behavior between the
config file and the command line? And which over-rides what?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread David Fetter
On Mon, Oct 09, 2006 at 02:34:09PM -0500, Jim C. Nasby wrote:
 On Mon, Oct 09, 2006 at 01:59:18PM -0400, Tom Lane wrote:
  Jim C. Nasby [EMAIL PROTECTED] writes:
   On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
   On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
   My first thought is that the rule should be to apply all the
   inclusion switches (implicitly including everything if there
   are none), then apply all the exclusion switches.
   
   +1 :) Order-dependent switches are a giant foot gun.
  
   They're also very powerful, as anyone who's ever used them in a
   non-trivial rsync (or rdiff-backup) scenareo can tell you.
  
  Sure, but the question is whether that incremental gain in
  capability is worth the extra logical complexity.  I'm inclined to
  think that many more users would get burned by the complexity than
  would have use for it.  Considering that we've gotten along this
  long with only the most primitive selection capabilities in
  pg_dump, it doesn't seem like there's an enormous demand for
  highly refined capabilities.
  
  (And I agree with David's comment that it might be better to
  reserve such behavior for a configuration file than to put it on
  the command line.)
 
 I can certainly see the logic in putting the more advanced
 capability in a config file of some kind (though, I think a simple
 include/exclude file is best for this...)
 
 The question becomes: do we want incompatible behavior between the
 config file and the command line? And which over-rides what?

The way I've cut this Gordian knot in the past is simply to make
command-line and file-based options for a given thing (e.g.
exclusion/inclusion) mutually exclusive and throw an error if somebody
attempts to mix them.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 Sure, but the question is whether that incremental gain in capability
 is worth the extra logical complexity.  I'm inclined to think that many
 more users would get burned by the complexity than would have use for it.
 Considering that we've gotten along this long with only the most
 primitive selection capabilities in pg_dump, it doesn't seem like
 there's an enormous demand for highly refined capabilities.

I disagree - we lose a lot of flexibility by taking out the ordering, and,
as was pointed out to me when I first started this patch a while ago,
we might as well front-load all the complexity and changes now rather
than adding them in release by release. I'm also not sure why the regex
should be changed to something even more non-standard than the current
POSIX ones. Finally, I'm surprised at the apparent willingness at this
point to shatter backwards-compatibility with previous -t scripts, as
this was an option I raised early on but met strong resistance, thus
the current compromise of allowing existing scripts to run unaltered,
while adding in the ability to do some regular expressions.

The regex stuff was discussed in January, and the patch submitted in
July, so it seems a little rushed to be changing the underlying behavior
so quickly right now (that behavior being the ability to control which
tables and schemas to dump). I think the original post about the request
to exclude a single table and still dump other objects is a fair one,
but I think we've morphed far beyond solving that problem.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200610092003
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iD8DBQFFKuPFvJuQZxSWSsgRAjAxAJ9oY5HCM4KxmpLEU56eCMJauHBhFgCfcyDt
R5yf5SKKBeBHJ2gdRlE1Pqs=
=rIxZ
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-07 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 The existing patch's behavior is that the rightmost switch wins, ie, if an
 object's name matches more than one pattern then it is included or excluded
 according to the rightmost switch it matches.

 My first thought is that the rule should be to apply all the inclusion
 switches (implicitly including everything if there are none), then apply all
 the exclusion switches.

I kinda like that, because it makes the behavior completely independent
of switch ordering, which seems like a good property to preserve.
Anyone else have an opinion pro or con?

 That leads to including non-schema objects only if there are no schema
 inclusion switches. Which seems pretty logical since if you're explicitly
 including objects then you'll only expect objects explicitly included to be
 dumped and you'll quickly realize there's no switch to bring in those
 non-schema objects. Maybe there should be a switch to include them just for
 completeness.

Well, pg_dump already has a --blobs switch, which has been a no-op
(because now the default) since 8.1, but it's still in the switch
parser.  It wouldn't take much to revive it for the purpose of causing
blobs to be dumped even when there's an inclusion switch.  As for PLs,
I'm not really too worried about dumping them per se (since it's usually
easy enough to create the ones you're using).  The functionality we're
really lacking there is the --include-dependencies switch that was
discussed upthread ... which I think is a fine idea but should wait for
8.3.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-07 Thread Josh Berkus
Tom,

 I kinda like that, because it makes the behavior completely independent
 of switch ordering, which seems like a good property to preserve.
 Anyone else have an opinion pro or con?

The only con argument I can think of is that tar and rsync, whose syntax 
is familiar to a lot of sysadmins, apply switches left-to-right.  

However, I don't feel that that is a compelling argument.  The include/exclude 
switch order processing is something I've always *hated* about tar and has 
messed me up more times than I can count.  Also, Windows users could care 
less if we behave like tar.

So +1 to go with orderless switching.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-07 Thread Tom Lane
I wrote:
 One argument that occurs to me for importing the psql code is that it's
 solved the problem of including a schema name in the pattern.  It would
 be a lot nicer to say -t schema.table than to have to say -t table -n
 schema.

The more I think about this, the more I think the above is a killer
argument.  We really should have had the ability to say -t schema.table
ever since schemas were added in 7.3, but no one got around to making it
happen.  If we go over to interpreting the arguments as standard regexes
then we'll never be able to do that, because we'll have foreclosed the
meaning of dot.  The psql pattern code was specifically designed as a
compromise notation adapted to SQL needs, and IMHO it's served pretty
well --- so I think we should adopt that into pg_dump rather than pure
regex notation.

 The psql code does allow you to get at most of the functionality of
 regexes...

Actually, it lets you get at all of it, though perhaps a bit awkwardly.
The transformations it makes are

.   =  schema vs name separator
*   =  .*
?   =  .

So the only regex patterns you can't write directly are dot, R* and R?
for which you can use these locutions:

.   =  ?
R*  =  (R+|)
R?  =  (R|)

(Perhaps this should be documented somewhere...)


So I propose that we should revise the patch to use psql's \d code to
determine which objects match a pattern.  I think that together with
Greg's idea of processing all inclusions before all exclusions should
answer the concerns I've got about the patch.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-07 Thread David Fetter
On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  The existing patch's behavior is that the rightmost switch wins,
  ie, if an object's name matches more than one pattern then it is
  included or excluded according to the rightmost switch it matches.
  This is, erm, poorly documented, but it seems like useful behavior
  so I don't have an objection myself.
 
 I don't know, it sounds like it's the source of the confusion you
 identify later.
 
 My first thought is that the rule should be to apply all the
 inclusion switches (implicitly including everything if there are
 none), then apply all the exclusion switches.

+1 :)

Order-dependent switches are a giant foot gun.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

 One issue is what to do with procedural languages and large objects,
 which don't have any associated schema.  If we treat them as being
 outside all schemas, we'd have semantics like this: dump the PLs and
 blobs unless one or more --schema switches appeared.  Is that OK?

Sounds fine.

Is there a possibility to dump only those objects? Maybe --large-objects
and --languages?

Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Richard Huxton

Tom Lane wrote:

Kris Jurka [EMAIL PROTECTED] writes:
Testing out the new pg_dump exclusion switches I've found that excluding a 
table means that no functions or types will be dumped.  Excluding one 
table shouldn't exclude these objects.


I tend to agree ... will see if I can make it happen.  (I never did get
around to reviewing that patch, anyway ...)

One issue is what to do with procedural languages and large objects,
which don't have any associated schema.  If we treat them as being
outside all schemas, we'd have semantics like this: dump the PLs and
blobs unless one or more --schema switches appeared.  Is that OK?


Is there a reason why pg_dump can't do the --list/--use-list flags like 
pg_restore, or is it just a matter of round tuits?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Andrew Dunstan

Richard Huxton wrote:


Is there a reason why pg_dump can't do the --list/--use-list flags 
like pg_restore, or is it just a matter of round tuits?




The major reason for having those features as I understand it was to 
help overcome dependency difficulties in dumps, which are now largely a 
thing of the past.


However, ISTM that a similar facility for fine grained control could 
fairly easily be built into pg_dump.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 However, ISTM that a similar facility for fine grained control could 
 fairly easily be built into pg_dump.

Yeah ... later.

The way I envision it is that the schema-related switches are fine for
selecting things at the level of whole schemas, and the table-related
switches are fine for selecting individual tables, and what we lack are
inclusion/exclusion switches that operate on other kinds of individual
objects.  Somebody can design and implement those later, if the itch
strikes.  What we have to do today is make sure that the interaction of
schema and table switches is such that an extension in that direction
will fit in naturally.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Andrew Dunstan

Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  
However, ISTM that a similar facility for fine grained control could 
fairly easily be built into pg_dump.



Yeah ... later.

The way I envision it is that the schema-related switches are fine for
selecting things at the level of whole schemas, and the table-related
switches are fine for selecting individual tables, and what we lack are
inclusion/exclusion switches that operate on other kinds of individual
objects.  Somebody can design and implement those later, if the itch
strikes.  What we have to do today is make sure that the interaction of
schema and table switches is such that an extension in that direction
will fit in naturally.


  


totally agree.

cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Tom Lane
Kris Jurka [EMAIL PROTECTED] writes:
 Testing out the new pg_dump exclusion switches I've found that excluding a 
 table means that no functions or types will be dumped.  Excluding one 
 table shouldn't exclude these objects.

I've been chewing on this a bit and find that the existing patch has
several behaviors that seem surprising.  Considering just one type of
inclusion/exclusion switches at a time (we'll get to the interaction of
schema and table switches below), I think we can all agree without too
much argument on these statements:

* With no inclusion/exclusion switches, all objects except system
  objects should be dumped.
* With only exclusion switches given, all objects except system
  objects and those matching at least one pattern should be dumped.
* With only inclusion switches given, only those objects matching
  at least one pattern should be dumped (whether they are system
  objects or not).

That last proviso might be debatable but on balance I think it's OK
(for instance, pg_dump --schema=information_schema could be useful
for debugging or documentation purposes).  Where things get interesting
is when you have both inclusion and exclusion switches given.  The
existing patch's behavior is that the rightmost switch wins, ie,
if an object's name matches more than one pattern then it is included or
excluded according to the rightmost switch it matches.  This is, erm,
poorly documented, but it seems like useful behavior so I don't have
an objection myself.  The real question is what should happen to objects
that don't match any of the switch patterns?  (This is relevant to Kris'
complaint because non-table objects should be treated the same as tables
that don't match any table name inclusion/exclusion switches.)

What I find in the existing code is that if an inclusion switch appears
first:

pg_dump -n 's.*' -N 'ss.*' ...

then only schemas matching an inclusion switch (and not matching any
later exclusion switch) are dumped.  While if an exclusion switch
appears first:

pg_dump -N 's.*' -n 'ss.*' ...

then all schemas are dumped except system schemas and those matching an
exclusion switch (and not matching any later inclusion switch).  So the
default behavior for unmatched objects flips depending on switch
order.  This doesn't seem to satisfy the principle of least surprise,
and it's certainly not adequately documented.  It might be the most
useful behavior though.  I thought about the alternative rule that
if any inclusion switches appear at all, the default is not to dump
--- that is, an object must match at least one inclusion switch (and not
match any later exclusion switch) to be dumped.  But with that rule,
exclusion switches before the first inclusion switch are actually
useless.  Has anyone got a better idea?

Returning to the point about schema versus table selection switches,
what we've got is that they are independent filters: to be dumped,
a table must be in a schema selected by the schema inclusion/exclusion
switches (if any), and it must have a name selected by the table
inclusion/exclusion switches (if any).  I think this is OK but it leads
to the property that the order of -n/-N switches is relevant, and the
order of -t/-T switches is relevant, but their order relative to each
other is not relevant.  This could be surprising.

If you're still with me, the payoff is here: what are the rules for
dumping non-table objects, given that there are no inclusion/exclusion
switches for them (but we might want to add such later)?  If only schema
inclusion/exclusion switches are present, then it's relatively easy to
say dump objects that are in selected schemas --- but what about
objects that don't have a schema, such as PLs?  And what about the case
where table inclusion/exclusion switches are present?  I said above that
non-table objects should be treated the same way as unmatched tables,
which I think is a necessary rule if we want to extend the set of switch
types later.  But that leads to the conclusion that a non-table object
is dumped unless a -t switch appears before any -T switches.  Which
strikes me as a mighty surprising behavior.  I'm not sure what to do
differently though.

Lastly, as long as we're questioning the premises of this patch,
I wonder about the choice to use regex pattern matching rules.
The problem with regex is that to be upward-compatible with the old
exact-match switch definitions, a switch value that doesn't contain
any regex special characters is treated as an equality condition not
a pattern, which makes for a discontinuity.  For instance, -t x is
treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
That's going to burn people.  An alternative we could consider is to
use LIKE patterns instead, but since underscore is a wildcard in LIKE,
it's easy to imagine people getting burnt by that too.  Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff.  None of these are especially attractive :-(


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Csaba Nagy
[Snip explanations]
 Comments?

Would it be reasonable to include one more switch: 'include
dependencies' ?

That would work like this:

* first consider all to be included objects (possibly limited by the
include switches);
* if dependencies are included, add all dependent objects, plus
non-schema objects (which arguably can be considered as dependencies for
the whole data base);
* remove all objects targeted by exclude switches;

This way you won't have any dependency on the ordering, and you could
consider all non-schema objects as dependencies, so they will only be
included if dependencies are to be included. 

Excluding dependencies would be the default if any switches are
specified, including otherwise (not sure how much is this of the
principle of least surprise, but it would be backwards compatible).

The scenario I most care about is to be able to make a complete data
base dump (including non-schema objects) while excluding only a few
tables. If I understood your explanations correctly, this would not be
easily possible with the current implementation. Note that I have a
patch (kludge ?) on the 8.1 pg_dump which does exactly this, it would be
a pity if I would need to patch the 8.2 one again to do that...

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Andrew Dunstan

Tom Lane wrote:

Lastly, as long as we're questioning the premises of this patch,
I wonder about the choice to use regex pattern matching rules.
The problem with regex is that to be upward-compatible with the old
exact-match switch definitions, a switch value that doesn't contain
any regex special characters is treated as an equality condition not
a pattern, which makes for a discontinuity.  For instance, -t x is
treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
That's going to burn people.  An alternative we could consider is to
use LIKE patterns instead, but since underscore is a wildcard in LIKE,
it's easy to imagine people getting burnt by that too.  Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff.  None of these are especially attractive :-(

Comments?





1. regexes, please.
2. I'd rather remove backwards compatibility than have the 
discontinuity. After all, users can anchor the expression pretty easily. 
If not, then let's use an alternate switch for the regexes, (I know we 
are running out of option space).


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes:
 Would it be reasonable to include one more switch: 'include
 dependencies' ?

We are two months past feature freeze ... adding entirely new features
to pg_dump is *not* on the table for 8.2.  What we need to do at the
moment is make sure that the features we've got work sanely and won't
create headaches for likely future extensions; but not actually
implement those extensions.

 The scenario I most care about is to be able to make a complete data
 base dump (including non-schema objects) while excluding only a few
 tables.

Isn't this the same as Kris' complaint?  Why do you need additional
dependency analysis to do the above?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 ... Or we could
 import the rather ad-hoc shell-wildcard-like rules used by psql's \d
 stuff.  None of these are especially attractive :-(

 1. regexes, please.

One argument that occurs to me for importing the psql code is that it's
solved the problem of including a schema name in the pattern.  It would
be a lot nicer to say -t schema.table than to have to say -t table -n
schema.  In particular this allows one to dump s1.foo and s2.bar
without also getting s1.bar and s2.foo; a problem that is insoluble if
we have only independent schema and table filters.  I think that ideally
one would only use the schema switches if one actually wanted a
schema-by-schema dump, not as a wart on the side of the
specific-object-selection switches.

The psql code does allow you to get at most of the functionality of
regexes...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Bruno Wolff III
On Fri, Oct 06, 2006 at 11:54:51 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 The problem with regex is that to be upward-compatible with the old
 exact-match switch definitions, a switch value that doesn't contain
 any regex special characters is treated as an equality condition not
 a pattern, which makes for a discontinuity.  For instance, -t x is
 treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
 That's going to burn people.  An alternative we could consider is to
 use LIKE patterns instead, but since underscore is a wildcard in LIKE,
 it's easy to imagine people getting burnt by that too.  Or we could
 import the rather ad-hoc shell-wildcard-like rules used by psql's \d
 stuff.  None of these are especially attractive :-(
 
 Comments?

How about making the regex's anchored by default? People who want unanchored
ones can add .* at the beginning and/or end. Since only whether or not
the pattern matches is important (not the string it matched), this keeps
all of the same power, but matches the old behavior in simple cases.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 The existing patch's behavior is that the rightmost switch wins, ie, if an
 object's name matches more than one pattern then it is included or excluded
 according to the rightmost switch it matches. This is, erm, poorly
 documented, but it seems like useful behavior so I don't have an objection
 myself.

I don't know, it sounds like it's the source of the confusion you identify
later.

My first thought is that the rule should be to apply all the inclusion
switches (implicitly including everything if there are none), then apply all
the exclusion switches.

That leads to including non-schema objects only if there are no schema
inclusion switches. Which seems pretty logical since if you're explicitly
including objects then you'll only expect objects explicitly included to be
dumped and you'll quickly realize there's no switch to bring in those
non-schema objects. Maybe there should be a switch to include them just for
completeness.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] pg_dump exclusion switches and functions/types

2006-10-05 Thread Kris Jurka


Testing out the new pg_dump exclusion switches I've found that excluding a 
table means that no functions or types will be dumped.  Excluding one 
table shouldn't exclude these objects.  My real use case for this 
functionality is that I have a database that has 99% of its space used by 
one big table and I'd like to be able to dump the rest of the database 
quickly.  If I lose function/type information it's useless.


Kris Jurka

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-05 Thread Tom Lane
Kris Jurka [EMAIL PROTECTED] writes:
 Testing out the new pg_dump exclusion switches I've found that excluding a 
 table means that no functions or types will be dumped.  Excluding one 
 table shouldn't exclude these objects.

I tend to agree ... will see if I can make it happen.  (I never did get
around to reviewing that patch, anyway ...)

One issue is what to do with procedural languages and large objects,
which don't have any associated schema.  If we treat them as being
outside all schemas, we'd have semantics like this: dump the PLs and
blobs unless one or more --schema switches appeared.  Is that OK?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings