[HACKERS] Add switches for DELIMITER and NULL in pg_dump COPY
Folks, From the earlier discussion, it appears that there is a variety of opinions on what the COPY delimiter should be in pg_dump. This patch allows people to set it and the NULL string. Thanks to Gavin Sherry for help with the pointers :) I didn't patch pg_dumpall, but it would be trivial if there's a use case. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778 Remember to vote! Index: doc/src/sgml/ref/pg_dump.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/pg_dump.sgml,v retrieving revision 1.81 diff -c -r1.81 pg_dump.sgml *** doc/src/sgml/ref/pg_dump.sgml 1 Nov 2005 21:09:50 - 1.81 --- doc/src/sgml/ref/pg_dump.sgml 6 Mar 2006 07:32:05 - *** *** 163,168 --- 163,208 /varlistentry varlistentry + termoption--copy-delimiter=replaceable class=parameterdelimiter/replaceable/option/term + listitem +para + Use replaceable class=parameterdelimiter/replaceable + instead of the default tab character in commandCOPY/command statements. +/para + /listitem + /varlistentry + + varlistentry + termoption--copy-null=replaceable class=parameterstring_for_nulls/replaceable/option/term + listitem +para + Use replaceable class=parameterstring_for_nulls/replaceable instead of the + default \N in commandCOPY/command statements. +/para + /listitem + /varlistentry + + varlistentry + termoption--copy-delimiter=replaceable class=parameterdelimiter/replaceable/option/term + listitem +para + Use replaceable class=parameterdelimiter/replaceable + instead of the default tab character in commandCOPY/command statements. +/para + /listitem + /varlistentry + + varlistentry + termoption--copy-null=replaceable class=parameterstring_for_nulls/replaceable/option/term + listitem +para + Use replaceable class=parameterstring_for_nulls/replaceable instead of the + default \N in commandCOPY/command statements. +/para + /listitem + /varlistentry + + varlistentry termoption-d/option/term termoption--inserts/option/term listitem Index: src/bin/pg_dump/pg_dump.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.433 diff -c -r1.433 pg_dump.c *** src/bin/pg_dump/pg_dump.c 5 Mar 2006 15:58:50 - 1.433 --- src/bin/pg_dump/pg_dump.c 6 Mar 2006 07:32:12 - *** *** 114,119 --- 114,125 /* flag to turn on/off dollar quoting */ static intdisable_dollar_quoting = 0; + /* Things used when caller invokes COPY options. */ + #define ARG_COPY_DELIMITER 2 + #define ARG_COPY_NULL 3 + char *copy_delimiter = \t; + char *copy_null; + static void help(const char *progname); static NamespaceInfo *findNamespace(Oid nsoid, Oid objoid); *** *** 181,186 --- 187,193 ExecStatusType expected); + int main(int argc, char **argv) { *** *** 211,217 char *outputSuperuser = NULL; RestoreOptions *ropt; ! static struct option long_options[] = { {data-only, no_argument, NULL, 'a'}, {blobs, no_argument, NULL, 'b'}, --- 218,224 char *outputSuperuser = NULL; RestoreOptions *ropt; ! static struct option long_options[] = { {data-only, no_argument, NULL, 'a'}, {blobs, no_argument, NULL, 'b'}, *** *** 249,254 --- 256,269 {disable-dollar-quoting, no_argument, disable_dollar_quoting, 1}, {disable-triggers, no_argument, disable_triggers, 1}, {use-set-session-authorization, no_argument, use_setsessauth, 1}, + + /* +* The following options don't have an equivalent short option +* letter, and are not available as -X long-name. Just use +* the long form. +*/ + {copy-delimiter, required_argument, NULL, ARG_COPY_DELIMITER}, + {copy-null, required_argument, NULL, ARG_COPY_NULL}, {NULL, 0, NULL, 0} }; *** *** 418,423 --- 433,460 break; /* This covers the long options equivalent to -X xxx. */ + case ARG_COPY_DELIMITER: + if ( strlen(optarg) != 1) + { + fprintf(stderr, _(In %s, copy-delimiter must be exactly one byte long, not %d.\n), +
Re: [HACKERS] Add switches for DELIMITER and NULL in pg_dump COPY
David Fetter [EMAIL PROTECTED] writes: From the earlier discussion, it appears that there is a variety of opinions on what the COPY delimiter should be in pg_dump. This patch allows people to set it and the NULL string. Did anyone provide a convincing use case for this? It's of zero value from the perspective of pg_dump itself; the only possible argument is that it makes it easier for program-foo to parse the output of pg_dump. But I don't see any programs around to parse arbitrary SQL scripts, especially not the pretty-PG-specific scripts that pg_dump emits. I think it much more likely that people needing this sort of thing would be using something like psql -c 'copy foo to stdout', so as to get the data without any added overhead. So this seems like mere creeping featurism to me. pg_dump has too many switches already. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Add switches for DELIMITER and NULL in pg_dump COPY
On Wed, Mar 08, 2006 at 11:03:00AM -0500, Tom Lane wrote: David Fetter [EMAIL PROTECTED] writes: From the earlier discussion, it appears that there is a variety of opinions on what the COPY delimiter should be in pg_dump. This patch allows people to set it and the NULL string. Did anyone provide a convincing use case for this? I've had one so far, and it was enough to cause me to make a special patched version of pg_dump. To get some idea of how drastic that was, consider that I think it's generally bad practice to compile from source because it can take you too far off the generally supported software map. The case I had was making a database with a schema and initial data whose dump output gets checked into a source code management system. Those initial data sets, which can change--for example when the corresponding ISO codes do--may be in many different tables, so the easiest way to do this is to make the dump file as easy as possible to edit. It's of zero value from the perspective of pg_dump itself; the only possible argument is that it makes it easier for program-foo to parse the output of pg_dump. But I don't see any programs around to parse arbitrary SQL scripts, especially not the pretty-PG-specific scripts that pg_dump emits. It's less about program-foo parsing than about multi-table data management, as above. However, I'm sure that there are people who will find other uses for it. I think it much more likely that people needing this sort of thing would be using something like psql -c 'copy foo to stdout', so as to get the data without any added overhead. The one-table-at-a-time approach is quite error-prone for large numbers of tables and/or large data sets. So this seems like mere creeping featurism to me. pg_dump has too many switches already. I've been careful to see to it that only people who use the switches are affected by it. I am also volunteering to do ongoing maintenance of this feature. :) Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Add switches for DELIMITER and NULL in pg_dump COPY
David Fetter [EMAIL PROTECTED] writes: On Wed, Mar 08, 2006 at 11:03:00AM -0500, Tom Lane wrote: Did anyone provide a convincing use case for this? I've had one so far, and it was enough to cause me to make a special patched version of pg_dump. To get some idea of how drastic that was, consider that I think it's generally bad practice to compile from source because it can take you too far off the generally supported software map. The case I had was making a database with a schema and initial data whose dump output gets checked into a source code management system. So? Don't tell me your SCMS can't handle tabs. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Add switches for DELIMITER and NULL in pg_dump COPY
On Wed, Mar 08, 2006 at 11:26:00AM -0500, Tom Lane wrote: David Fetter [EMAIL PROTECTED] writes: On Wed, Mar 08, 2006 at 11:03:00AM -0500, Tom Lane wrote: Did anyone provide a convincing use case for this? I've had one so far, and it was enough to cause me to make a special patched version of pg_dump. To get some idea of how drastic that was, consider that I think it's generally bad practice to compile from source because it can take you too far off the generally supported software map. The case I had was making a database with a schema and initial data whose dump output gets checked into a source code management system. So? Don't tell me your SCMS can't handle tabs. Not everybody's editor/mailer/whatever does this right, and it makes things fragile. Another way to do this is to change the delimter to a printable character like '|', but that raises hackles, too. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Add switches for DELIMITER and NULL in pg_dump COPY
David Fetter [EMAIL PROTECTED] writes: Not everybody's editor/mailer/whatever does this right, and it makes things fragile. Another way to do this is to change the delimter to a printable character like '|', but that raises hackles, too. Frankly if you're passing you data through an editor/mailer/whatever you don't trust then your setup is already fragile. At least if you're using tabs then you find out about these problems. Tiptoeing around the untrustworthy process just means that it'll fail randomly (and unpredictably) when other characters appear in the data that the software doesn't handle. There are certainly cases where you'll need to do this to interface with other (amateurish) software. But pg_dump isn't for that at all. Even COPY isn't a general purpose data formatter. To interface with other software not using a standard format you're going to have to pass the data through Perl or something like that anyways. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend