Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"

Hannu Krosing Fri, 12 Apr 2013 05:08:12 -0700

On 04/11/2013 12:17 AM, Tom Lane wrote:

Alvaro Herrera <alvhe...@2ndquadrant.com> writes:

Hannu Krosing wrote:

Natural solution to this seems to move most of pg_dump functionality
into backend as functions, so we have pg_dump_xxx() for everything
we want to dump plus a topological sort function for getting the
objects in right order.

This idea doesn't work because of back-patch considerations (i.e. we
would not be able to create the functions in back branches, and so this
new style of pg_dump would only work with future server versions).  So
pg_dump itself would have to retain capability to dump stuff from old
servers.  This seems unlikely to fly at all, because we'd be then
effectively maintaining pg_dump in two places, both backend and the
pg_dump source code.

There are other issues too, in particular that most of the backend's
code tends to work on SnapshotNow time whereas pg_dump would really
prefer it was all done according to the transaction snapshot.

I was just thinking of moving the queries the pg_dump currently
uses into UDF-s, which do _not_ use catalog cache, but will use
the same SQL to query catalogs as pg_dump currently does
using whatever snapshot mode is currently set .


the pg_dump will need to still have the same queries for older
versions of postgresql but for new versions pg_dump  can become
catalog-agnostic.

and I think that we can retire pg_dump support for older
postgresql versions the same way we drop support for
older versions of postgresql itself.

Hannu

We have
got bugs of that ilk already in pg_dump, but we shouldn't introduce a
bunch more.  Doing this right would therefore mean that we'd have to
write a lot of duplicative code in the backend, ie, it's not clear that
we gain any synergy by pushing the functionality over.  It might
simplify cross-backend-version issues (at least for backend versions
released after we'd rewritten all that code) but otherwise I'm afraid
it'd just be pushing the problems somewhere else.

In any case, "push it to the backend" offers no detectable help with the
core design issue here, which is figuring out what functionality needs
to be exposed with what API.

main things I see would be

 * get_list_of_objects(object_type, pattern or namelist)
 * get_sql_def_for_object(object_type, object_name)
 * sort_by_dependency(list of [obj_type, obj_name])

from this you could easily construct most uses, especially if
sort_by_dependency(list of [obj_type, obj_name])
would be smart enough to break circular dependencies, like
turning to tables with mutual FK-s into tabledefs without
FKs + separate constraints.

Or we could always have constraints separately, so that
the ones depending on non-exported objects would be easy
to leave out

My be the dependency API analysis itself is something
worth a GSOC effort ?

Hannu


                        regards, tom lane




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"

Reply via email to