On Sun, Jan 29, 2012 at 12:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Joachim Wieland <j...@mcknight.de> writes: >> I know that you took back some of your comments, but I'm with you >> here. Archive is allocated as an ArchiveHandle and then casted back to >> Archive*, so you always know that an Archive is an ArchiveHandle. I'm >> all for getting rid of Archive and just using ArchiveHandle throughout >> pg_dump which would get rid of these useless casts. > > I'd like to see a more thoroughgoing look at the basic structure of > pg_dump. Everybody who's ever looked at that code has found it > confusing, with the possible exception of the original author who is > long gone from the project anyway. I don't know exactly what would make > it better, but the useless distinction between Archive and ArchiveHandle > seems like a minor annoyance, not the core disease. > > Not that there'd be anything wrong with starting with that.
After some study, I'm reluctant to completely abandon the Archive/ArchiveHandle distinction because it seems to me that it does serve some purpose: right now, nothing in pg_dump.c - where all the code to actually dump stuff lives - knows anything about what's inside the ArchiveHandle, just the Archive. So having two data structures really does serve a purpose: if it's part of the Archive, you need it in order to query the system catalogs and generate SQL. If it isn't, you don't. Considering how much more crap there is inside an ArchiveHandle than an Archive, I don't think we should lightly abandon that distinction. Now, that having been said, there are probably better ways of making that distinction than what we have here; Archive and ArchiveHandle could be better named, perhaps, and we could have pointers from one structure to the other instead of magically embedding them inside each other. All the function pointers that are part of the ArchiveHandle could be separated out into a static, constant structure with a name like ArchiveMethod, and we could keep a pointer to the appropriate ArchiveMethod inside each ArchiveHandle instead of copying all the pointers into it. I think that'd be a lot less confusing. But the immediate problem is that pg_dump.c is heavily reliant on global variables, which isn't going to fly if we want this code to use threads on Windows (or anywhere else). It's also bad style. So I suggest that we refactor pg_dump.c to get rid of g_conn and g_fout. Getting rid of g_fout should be fairly easy: in many places, we're already passing Archive *fout as a parameter. If we pass it as a parameter to the functions that need it but aren't yet getting it as a parameter, then it can cease to exist as a global variable and become local to main(). Getting rid of g_conn is a little harder. Joachim's patch takes the view that we can safely overload the existing ArchiveHandle.connection member. Currently, that member is the connection to which we are doing a direct to database restore; what he's proposing to do is also use it to store the connection from which are doing the dump. But it seems possible that we might someday want to dump from one database and restore into another database at the same time, so maybe we ought to play it safe and use different variables for those things. So I'm inclined to think we could just add a "PGconn *remote_connection" argument to the Archive structure (to go with all of the similarly named "remote" fields, all of which describe the DB to be dumped) and then that would be available everywhere that the Archive itself is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers