Andrew Dunstan wrote:
> If the null marker is not an empty string, it gets an error, of
> course - if it is it gets a null:
> 
> [EMAIL PROTECTED] pginst]$ echo ',,' | bin/psql -c "create temp
> table foo (a int, b text, c text); copy foo from stdin delimiter
> ',\"' null '\\\\N';" ERROR:  invalid input syntax for integer:
> "" CONTEXT:  COPY foo, line 1, column a: "" [EMAIL PROTECTED]
> pginst]$ echo ',,' | bin/psql -c "create temp table foo (a int,
> b text, c text); copy foo from stdin delimiter ',\"' ;"
> [EMAIL PROTECTED] pginst]$
> 
> 
> I hope that is expected behaviour - it's what *I* expect, at
> least.
> >
> 
> Attached patch has these additions to previously posted patch:
> . quote character may not appear in NULL marker
> . any non-null value that matches the NULL marker is forced to be quoted
> when written.

OK, here is a new version of the patch that includes the grammar
changes we agreed upon, SGML changes, and \copy support.  I will not
make any more changes without contacting you so feel free to make
adjustments and repost.

I have two open issues.  First, CSV should support WITH OIDS, no?

Second, I found a problem with NULLs.  If I do:
.
        test=> create table test (x text, y text);
        CREATE TABLE
        test=> insert into test values ('', NULL);
        INSERT 17221 1
        test=>

then this:

        test=> copy test to '/tmp/b' with csv;

creates:

        "",

and this:

        test=> copy test to '/tmp/b' with csv NULL 'fred';

creates:

        ,fred

Is that logical?  A non-null field went from "" to nothing.

I think it is caused by this code:

         bool force_quote = (strcmp(string, null_print) == 0);
         CopyAttributeOutCSV(string, delim, quote, escape,
                             force_quote);

The reason it happens is that when the null string is '', it matches a
zero-length string, so the value is quoted.  When the null stirng isn't
blank, a zero-length string doesn't match the null string so it isn't
quoted.    I think we need to add special logic for zero-length strings
so they are always quoted, even if there is a special null string.  This
will make our dumps more consistent, I think, or maybe the current
behavior is OK.  It just struck me as strange.

I did a dump/reload test with a null string and null, and it worked
fine.

Is there any data that can not be dumped/reloaded via CSV?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.55
diff -c -c -r1.55 copy.sgml
*** doc/src/sgml/ref/copy.sgml  13 Dec 2003 23:59:07 -0000      1.55
--- doc/src/sgml/ref/copy.sgml  13 Apr 2004 04:18:22 -0000
***************
*** 26,32 ****
            [ BINARY ] 
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable 
class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] 
]
  
  COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable 
class="parameter">column</replaceable> [, ...] ) ]
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
--- 26,34 ----
            [ BINARY ] 
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable 
class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
!           [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' 
] 
!                 [ ESCAPE [ AS ] '<replaceable 
class="parameter">escape</replaceable>' ] ]
  
  COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable 
class="parameter">column</replaceable> [, ...] ) ]
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
***************
*** 34,40 ****
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable 
class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] 
]
  </synopsis>
   </refsynopsisdiv>
   
--- 36,44 ----
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable 
class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
!           [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' 
] 
!                 [ ESCAPE [ AS ] '<replaceable 
class="parameter">escape</replaceable>' ] ]
  </synopsis>
   </refsynopsisdiv>
   
***************
*** 136,142 ****
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)
       </para>
      </listitem>
     </varlistentry>
--- 140,146 ----
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)  FIX CSV FOR OIDS!
       </para>
      </listitem>
     </varlistentry>
***************
*** 146,152 ****
      <listitem>
       <para>
        The single character that separates columns within each row
!       (line) of the file.  The default is a tab character.
       </para>
      </listitem>
     </varlistentry>
--- 150,157 ----
      <listitem>
       <para>
        The single character that separates columns within each row
!       (line) of the file.  The default is a tab character in normal mode,
!       a comma in <literal>CSV</> mode.
       </para>
      </listitem>
     </varlistentry>
***************
*** 156,175 ****
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N). You might prefer an empty
!       string, for example.
       </para>
  
       <note>
        <para>
!        On a <command>COPY FROM</command>, any data item that matches
         this string will be stored as a null value, so you should make
         sure that you use the same string as you used with
         <command>COPY TO</command>.
        </para>
       </note>
      </listitem>
     </varlistentry>
    </variablelist>
   </refsect1>
  
--- 161,225 ----
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N) in normal mode, and a missing
!       value (no quotes) in <literal>CSV</> mode. You might prefer an empty
!       string in cases where you don't want to distinguish nulls from
!       empty strings.
       </para>
  
       <note>
        <para>
!        When using <command>COPY FROM</command>, any data item that matches
         this string will be stored as a null value, so you should make
         sure that you use the same string as you used with
         <command>COPY TO</command>.
        </para>
+ 
+       <para>
+        If you do not want anything used as null when using
+        <command>COPY FROM</command>, you can specify some value that is very
+        unlikely to appear in the file, such as <literal>frobnitz</literal> or
+        <literal>d5f4074b254c76cd8ae37bf1731f4aed</literal> (which is
+        <literal>md5('frobnitz')</literal>). This could be especially useful
+        when importing a <literal>CSV</> file into a table with <literal>NOT NULL</>
+        columns.
+       </para>
       </note>
+ 
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>CSV</literal></term>
+     <listitem>
+      <para>
+       Enables Comma Separated Variable (<literal>CSV</>) mode.  It sets the
+       default <literal>DELIMITER</> to comma, and <literal>QUOTE</> and
+       <literal>ESCAPE</> values to double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term><replaceable class="parameter">quote string</replaceable></term>
+     <listitem>
+      <para>
+       Specifies the quotation character in <literal>CSV</> mode.
+       The default is double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term><replaceable class="parameter">escape string</replaceable></term>
+     <listitem>
+      <para>
+       Specifies the character that should appear before a <literal>QUOTE</>
+       data character value in <literal>CSV</> mode.  The default is double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
    </variablelist>
   </refsect1>
  
***************
*** 253,259 ****
  
     <para>
      When <command>COPY</command> is used without the <literal>BINARY</literal> 
option,
!     the data read or written is a text file with one line per table row.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
--- 303,310 ----
  
     <para>
      When <command>COPY</command> is used without the <literal>BINARY</literal> 
option,
!     the data read or written is a text file with one line per table row,
!     unless <literal>CSV</> mode is used.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
***************
*** 377,382 ****
--- 428,473 ----
      meant as data, <command>COPY FROM</command> will complain if the line
      endings in the input are not all alike.
     </para>
+   </refsect2>
+ 
+   <refsect2>
+    <title>CSV Format</title>
+ 
+    <para>
+     This format is used for importing from and exporting to the Comma
+     Separated Variable (<literal>CSV</>) file format used by many other programs,
+     such as spreadsheets. Instead of the escaping used by
+     <productname>PostgreSQL</productname>'s standard text mode, it produces
+     and recognises the common CSV escaping mechanism.
+    </para>
+ 
+    <para>
+     The values in each record are separated by the delimiter character.
+     If the value contains the delimiter character, the <literal>QUOTE</> character
+     or a carriage return or line feed character, then the whole value is prefixed
+     and suffixed by the quote character, and any occurrence within the value
+     of a quote character or the <literal>ESCAPE</> character is preceded 
+     by the escape character.
+    </para>
+ 
+    <note>
+     <para>
+      CSV mode will both recognise and produce CSV files with quoted values
+      containing embedded carriage returns and line feeds. Thus the files are
+      not strictly one line per table row like non-CSV files.
+     </para>
+    </note>
+ 
+    <note>
+     <para>
+      Many programs produce strange and occasionally perverse CSV files, so
+      the file format is more a convention than a standard. Thus you might
+      encounter some files that cannot be imported using this mechanism, and
+      <command>COPY</> might produce files that other programs can not 
+      process.
+     </para>
+    </note>
+     
    </refsect2>
  
    <refsect2>
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.110
diff -c -c -r1.110 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml      12 Apr 2004 15:58:52 -0000      1.110
--- doc/src/sgml/ref/psql-ref.sgml      13 Apr 2004 04:18:25 -0000
***************
*** 711,716 ****
--- 711,718 ----
              [ <literal>oids</literal> ] 
              [ <literal>delimiter [as] </literal> '<replaceable 
class="parameter">character</replaceable>' ]
              [ <literal>null [as] </literal> '<replaceable 
class="parameter">string</replaceable>' ]</literal>
+             [ <literal>csv [ quote [as] </literal> '<replaceable 
class="parameter">string</replaceable>' ]
+                            [ <literal>escape [as] </literal> '<replaceable 
class="parameter">string</replaceable>' ] ]
          </term>
  
          <listitem>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.219
diff -c -c -r1.219 copy.c
*** src/backend/commands/copy.c 6 Apr 2004 13:21:33 -0000       1.219
--- src/backend/commands/copy.c 13 Apr 2004 04:18:27 -0000
***************
*** 70,76 ****
  typedef enum CopyReadResult
  {
        NORMAL_ATTR,
!       END_OF_LINE
  } CopyReadResult;
  
  /*
--- 70,77 ----
  typedef enum CopyReadResult
  {
        NORMAL_ATTR,
!       END_OF_LINE,
!       UNTERMINATED_FIELD
  } CopyReadResult;
  
  /*
***************
*** 130,144 ****
  
  /* non-export function prototypes */
  static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print);
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!                char *delim, char *null_print);
  static bool CopyReadLine(void);
  static char *CopyReadAttribute(const char *delim, const char *null_print,
                                                           CopyReadResult *result, 
bool *isnull);
  static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
                                                Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);
  static void limit_printout_length(StringInfo buf);
  
--- 131,150 ----
  
  /* non-export function prototypes */
  static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print, bool csv_mode, char *quote, char *escape);
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!                char *delim, char *null_print, bool csv_mode, char *quote, char 
*escape);
  static bool CopyReadLine(void);
  static char *CopyReadAttribute(const char *delim, const char *null_print,
                                                           CopyReadResult *result, 
bool *isnull);
+ static char *CopyReadAttributeCSV(const char *delim, const char *null_print,
+                                                          char *quote, char *escape,
+                                                          CopyReadResult *result, 
bool *isnull);
  static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
                                                Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
+ static void CopyAttributeOutCSV(char *string, char *delim, char *quote,
+                                                               char *escape, bool 
force_quote);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);
  static void limit_printout_length(StringInfo buf);
  
***************
*** 682,688 ****
--- 688,697 ----
        List       *attnumlist;
        bool            binary = false;
        bool            oids = false;
+       bool        csv_mode = false;
        char       *delim = NULL;
+       char       *quote = NULL;
+       char       *escape = NULL;
        char       *null_print = NULL;
        Relation        rel;
        AclMode         required_access = (is_from ? ACL_INSERT : ACL_SELECT);
***************
*** 725,730 ****
--- 734,763 ----
                                                 errmsg("conflicting or redundant 
options")));
                        null_print = strVal(defel->arg);
                }
+               else if (strcmp(defel->defname, "csv") == 0)
+               {
+                       if (csv_mode)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("conflicting or redundant 
options")));
+                       csv_mode = intVal(defel->arg);
+               }
+               else if (strcmp(defel->defname, "quote") == 0)
+               {
+                       if (quote)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("conflicting or redundant 
options")));
+                       quote = strVal(defel->arg);
+               }
+               else if (strcmp(defel->defname, "escape") == 0)
+               {
+                       if (escape)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("conflicting or redundant 
options")));
+                       escape = strVal(defel->arg);
+               }
                else
                        elog(ERROR, "option \"%s\" not recognized",
                                 defel->defname);
***************
*** 735,740 ****
--- 768,778 ----
                                (errcode(ERRCODE_SYNTAX_ERROR),
                                 errmsg("cannot specify DELIMITER in BINARY mode")));
  
+       if (binary && csv_mode)
+               ereport(ERROR,
+                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                errmsg("cannot specify CSV in BINARY mode")));
+ 
        if (binary && null_print)
                ereport(ERROR,
                                (errcode(ERRCODE_SYNTAX_ERROR),
***************
*** 742,751 ****
  
        /* Set defaults */
        if (!delim)
!               delim = "\t";
! 
        if (!null_print)
!               null_print = "\\N";
  
        /*
         * Open and lock the relation, using the appropriate lock type.
--- 780,847 ----
  
        /* Set defaults */
        if (!delim)
!               delim = csv_mode ? "," : "\t";
!       
        if (!null_print)
!               null_print = csv_mode ? "" : "\\N";
! 
!       if (csv_mode)
!       {
!               if (!quote)
!                       quote = "\"";
!               if (!escape)
!                       escape = "\"";
!       }
!               
!       /*
!        * Only single-character delimiter strings are supported.
!        */
!       if (strlen(delim) != 1)
!               ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("COPY delimiter must be a single character")));
! 
!       /*
!        * Check quote
!        */
!       if (!csv_mode && quote != NULL)
!               ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("COPY quote available only in CSV mode")));
! 
!       if (csv_mode && strlen(quote) != 1)
!               ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("COPY quote must be a single character")));
! 
!       /*
!        * Check escape
!        */
!       if (!csv_mode && escape != NULL)
!               ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("COPY escape available only in CSV mode")));
! 
!       if (csv_mode && strlen(escape) != 1)
!               ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("COPY escape must be a single character")));
! 
!       /*
!        * Don't allow the delimiter to appear in the null string.
!        */
!       if (strchr(null_print, delim[0]) != NULL)
!               ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("COPY delimiter must not appear in the NULL 
specification")));
! 
!       /*
!        * Don't allow the csv quote char to appear in the null string.
!        */
!       if (csv_mode && strchr(null_print, quote[0]) != NULL)
!               ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("CSV quote character must not appear in the 
NULL specification")));
  
        /*
         * Open and lock the relation, using the appropriate lock type.
***************
*** 772,791 ****
                                           "psql's \\copy command also works for 
anyone.")));
  
        /*
!        * Presently, only single-character delimiter strings are supported.
         */
-       if (strlen(delim) != 1)
-               ereport(ERROR,
-                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                                errmsg("COPY delimiter must be a single character")));
  
!       /*
!        * Don't allow the delimiter to appear in the null string.
!        */
!       if (strchr(null_print, delim[0]) != NULL)
                ereport(ERROR,
!                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                                errmsg("COPY delimiter must not appear in the NULL 
specification")));
  
        /*
         * Don't allow COPY w/ OIDs to or from a table without them
--- 868,880 ----
                                           "psql's \\copy command also works for 
anyone.")));
  
        /*
!        * Don't allow OIDs in CSV mode
         */
  
!       if (csv_mode && oids)  // FIX ME bjm
                ereport(ERROR,
!                               (errcode(ERRCODE_SYNTAX_ERROR),
!                                errmsg("Cannot specify OIDS in CSV mode ")));
  
        /*
         * Don't allow COPY w/ OIDs to or from a table without them
***************
*** 864,870 ****
                                                 errmsg("\"%s\" is a directory", 
filename)));
                        }
                }
!               CopyFrom(rel, attnumlist, binary, oids, delim, null_print);
        }
        else
        {                                                       /* copy from database 
to file */
--- 953,960 ----
                                                 errmsg("\"%s\" is a directory", 
filename)));
                        }
                }
!               CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
!                                quote, escape);
        }
        else
        {                                                       /* copy from database 
to file */
***************
*** 926,932 ****
                                                 errmsg("\"%s\" is a directory", 
filename)));
                        }
                }
!               CopyTo(rel, attnumlist, binary, oids, delim, null_print);
        }
  
        if (!pipe)
--- 1016,1023 ----
                                                 errmsg("\"%s\" is a directory", 
filename)));
                        }
                }
!               CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
!                               quote, escape);
        }
  
        if (!pipe)
***************
*** 958,964 ****
   */
  static void
  CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print)
  {
        HeapTuple       tuple;
        TupleDesc       tupDesc;
--- 1049,1056 ----
   */
  static void
  CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print, bool csv_mode, char *quote,
!          char *escape)
  {
        HeapTuple       tuple;
        TupleDesc       tupDesc;
***************
*** 1051,1057 ****
        while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
        {
                bool            need_delim = false;
- 
                CHECK_FOR_INTERRUPTS();
  
                MemoryContextReset(mycontext);
--- 1143,1148 ----
***************
*** 1113,1119 ****
                                                                                       
                            value,
                                                                  
ObjectIdGetDatum(elements[attnum - 1]),
                                                        Int32GetDatum(attr[attnum - 
1]->atttypmod)));
!                                       CopyAttributeOut(string, delim);
                                }
                                else
                                {
--- 1204,1218 ----
                                                                                       
                            value,
                                                                  
ObjectIdGetDatum(elements[attnum - 1]),
                                                        Int32GetDatum(attr[attnum - 
1]->atttypmod)));
!                                       if (csv_mode)
!                                       {
!                                               bool force_quote = (strcmp(string, 
null_print) == 0);
!                                               CopyAttributeOutCSV(string, delim, 
quote, escape,
!                                                                                      
 force_quote);
!                                       }
!                                       else
!                                               CopyAttributeOut(string, delim);
! 
                                }
                                else
                                {
***************
*** 1243,1249 ****
   */
  static void
  CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!                char *delim, char *null_print)
  {
        HeapTuple       tuple;
        TupleDesc       tupDesc;
--- 1342,1349 ----
   */
  static void
  CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!                char *delim, char *null_print, bool csv_mode, char *quote,
!                char *escape)
  {
        HeapTuple       tuple;
        TupleDesc       tupDesc;
***************
*** 1388,1396 ****
        ExecBSInsertTriggers(estate, resultRelInfo);
  
        if (!binary)
-       {
                file_has_oids = oids;   /* must rely on user to tell us this... */
-       }
        else
        {
                /* Read and verify binary header */
--- 1488,1494 ----
***************
*** 1499,1504 ****
--- 1597,1603 ----
  
                        if (file_has_oids)
                        {
+                               /* can't be in CSV mode here */
                                string = CopyReadAttribute(delim, null_print,
                                                                                   
&result, &isnull);
  
***************
*** 1537,1550 ****
                                                         errmsg("missing data for 
column \"%s\"",
                                                                        
NameStr(attr[m]->attname))));
  
!                               string = CopyReadAttribute(delim, null_print,
!                                                                                  
&result, &isnull);
! 
!                               if (isnull)
                                {
!                                       /* we read an SQL NULL, no need to do anything 
*/
                                }
                                else
                                {
                                        copy_attname = NameStr(attr[m]->attname);
                                        values[m] = FunctionCall3(&in_functions[m],
--- 1636,1658 ----
                                                         errmsg("missing data for 
column \"%s\"",
                                                                        
NameStr(attr[m]->attname))));
  
!                               if (csv_mode)
                                {
!                                       string = CopyReadAttributeCSV(delim, 
null_print, quote,
!                                                                                      
           escape, &result, &isnull);
!                                       if (result == UNTERMINATED_FIELD)
!                                               ereport(ERROR,
!                                                               
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                                                errmsg("unterminated 
CSV quoted field")));
                                }
                                else
+                                       string = CopyReadAttribute(delim, null_print, 
+                                                                                      
    &result, &isnull);
+                                       
+ 
+                               
+                               /* we read an SQL NULL, no need to do anything */
+                               if (!isnull)
                                {
                                        copy_attname = NameStr(attr[m]->attname);
                                        values[m] = FunctionCall3(&in_functions[m],
***************
*** 2069,2074 ****
--- 2177,2328 ----
        return attribute_buf.data;
  }
  
+ 
+ /*
+  * Read the value of a single attribute in CSV mode, 
+  * performing de-escaping as needed. Escaping does not follow the normal
+  * PostgreSQL text mode, but instead "standard" (i.e. common) CSV usage.
+  *
+  * Quoted fields can span lines, in which case the line end is embedded
+  * in the returned string.
+  *
+  * null_print is the null marker string.  Note that this is compared to
+  * the pre-de-escaped input string (thus if it is quoted it is not a NULL).
+  *
+  * *result is set to indicate what terminated the read:
+  *            NORMAL_ATTR:    column delimiter
+  *            END_OF_LINE:    end of line
+  *      UNTERMINATED_FIELD no quote detected at end of a quoted field
+  *
+  * In any case, the string read up to the terminator (or end of file)
+  * is returned.
+  *
+  * *isnull is set true or false depending on whether the input matched
+  * the null marker.  Note that the caller cannot check this since the
+  * returned string will be the post-de-escaping equivalent, which may
+  * look the same as some valid data string.
+  *----------
+  */
+ 
+ static char *
+ CopyReadAttributeCSV(const char *delim, const char *null_print, char *quote,
+                                        char *escape, CopyReadResult *result, bool 
*isnull)
+ {
+       char        delimc = delim[0];
+       char        quotec = quote[0];
+       char        escapec = escape[0];
+       char            c;
+       int                     start_cursor = line_buf.cursor;
+       int                     end_cursor = start_cursor;
+       int                     input_len;
+       bool        in_quote = false;
+       bool        saw_quote = false;
+ 
+       /* reset attribute_buf to empty */
+       attribute_buf.len = 0;
+       attribute_buf.data[0] = '\0';
+ 
+       /* set default status */
+       *result = END_OF_LINE;
+ 
+       for (;;)
+       {
+               /* handle multiline quoted fields */
+               if (in_quote && line_buf.cursor >= line_buf.len)
+               {
+                       bool done;
+ 
+                       switch(eol_type)
+                       {
+                               case EOL_NL:
+                                       appendStringInfoString(&attribute_buf,"\n");
+                                       break;
+                               case EOL_CR:
+                                       appendStringInfoString(&attribute_buf,"\r");
+                                       break;
+                               case EOL_CRNL:
+                                       appendStringInfoString(&attribute_buf,"\r\n");
+                                       break;
+                               case EOL_UNKNOWN:
+                                       /* shouldn't happen - just keep going */
+                                       break;
+                       }
+ 
+                       copy_lineno++;
+                       done = CopyReadLine();
+                       if (done && line_buf.len == 0)
+                               break;
+                       start_cursor = line_buf.cursor;
+               }
+ 
+               end_cursor = line_buf.cursor;
+               if (line_buf.cursor >= line_buf.len)
+                       break;
+               c = line_buf.data[line_buf.cursor++];
+               /* 
+                * unquoted field delimiter 
+                */
+               if (!in_quote && c == delimc)
+               {
+                       *result = NORMAL_ATTR;
+                       break;
+               }
+               /* 
+                * start of quoted field (or part of field) 
+                */
+               if (!in_quote && c == quotec)
+               {
+                       saw_quote = true;
+                       in_quote = true;
+                       continue;
+               }
+               /* 
+                * escape within a quoted field
+                */
+               if (in_quote && c == escapec)
+               {
+                       /* 
+                        * peek at the next char if available, and escape it if it
+                        * is an escape char or a quote char
+                        */
+                       if (line_buf.cursor <= line_buf.len)
+                       {
+                               char nextc = line_buf.data[line_buf.cursor];
+                               if (nextc == escapec || nextc == quotec)
+                               {
+                                       appendStringInfoCharMacro(&attribute_buf, 
nextc);
+                                       line_buf.cursor++;
+                                       continue;
+                               }
+                       }
+               }
+               /*
+                * end of quoted field. 
+                * Must do this test after testing for escape in case quote char
+                * and escape char are the same (which is the common case).
+                */
+               if (in_quote && c == quotec)
+               {
+                       in_quote = false;
+                       continue;
+               }
+               appendStringInfoCharMacro(&attribute_buf, c);
+       }
+ 
+       if (in_quote)
+               *result = UNTERMINATED_FIELD;
+ 
+       /* check whether raw input matched null marker */
+       input_len = end_cursor - start_cursor;
+       if (!saw_quote && input_len == strlen(null_print) &&
+               strncmp(&line_buf.data[start_cursor], null_print, input_len) == 0)
+               *isnull = true;
+       else
+               *isnull = false;
+ 
+       return attribute_buf.data;
+ }
+ 
  /*
   * Read a binary attribute
   */
***************
*** 2192,2197 ****
--- 2446,2518 ----
                                break;
                }
        }
+ }
+ 
+ /*
+  * Send CSV representation of one attribute, with conversion and 
+  * CSV type escaping
+  */
+ static void
+ CopyAttributeOutCSV(char *server_string, char *delim, char *quote,
+                                       char *escape, bool force_quote)
+ {
+       char       *string;
+       char            c;
+       char            delimc = delim[0];
+       char        quotec = quote[0];
+       char        escapec = escape[0];
+       bool        need_quote = force_quote;
+       char        *test_string;
+       bool            same_encoding;
+       int                     mblen;
+       int                     i;
+ 
+       same_encoding = (server_encoding == client_encoding);
+       if (!same_encoding)
+               string = (char *) pg_server_to_client((unsigned char *) server_string,
+                                                                                      
   strlen(server_string));
+       else
+               string = server_string;
+ 
+       /* have to run through the string twice,
+        * first time to see if it needs quoting, second to actually send it
+        */
+ 
+       for(test_string = string; 
+               !need_quote && (c = *test_string) != '\0'; 
+               test_string += mblen)
+       {
+               if (c == delimc || c == quotec || c == '\n' || c == '\r')
+                       need_quote = true;
+               if (!same_encoding)
+                       mblen = pg_encoding_mblen(client_encoding, test_string);
+               else
+                       mblen = 1;
+       }
+ 
+       if (need_quote)
+               CopySendChar(quotec);
+ 
+       for (; (c = *string) != '\0'; string += mblen)
+       {
+               if (c == quotec || c == escapec)
+                       CopySendChar(escapec);
+ 
+               CopySendChar(c);
+ 
+               if (!same_encoding)
+               {
+                       /* send additional bytes of the char, if any */
+                       mblen = pg_encoding_mblen(client_encoding, string);
+                       for (i = 1; i < mblen; i++)
+                               CopySendChar(string[i]);
+               }
+               else
+                       mblen = 1;
+       }
+ 
+       if (need_quote)
+               CopySendChar(quotec);
  }
  
  /*
Index: src/backend/parser/gram.y
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/gram.y,v
retrieving revision 2.450
diff -c -c -r2.450 gram.y
*** src/backend/parser/gram.y   5 Apr 2004 03:07:26 -0000       2.450
--- src/backend/parser/gram.y   13 Apr 2004 04:18:31 -0000
***************
*** 343,349 ****
        CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
        CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
        COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
!       CREATEUSER CROSS CURRENT_DATE CURRENT_TIME
        CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
  
        DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 343,349 ----
        CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
        CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
        COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
!       CREATEUSER CROSS CSV CURRENT_DATE CURRENT_TIME
        CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
  
        DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
***************
*** 386,391 ****
--- 386,393 ----
        PRECISION PRESERVE PREPARE PRIMARY 
        PRIOR PRIVILEGES PROCEDURAL PROCEDURE
  
+       QUOTE
+ 
        READ REAL RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
        RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
        RULE
***************
*** 1356,1361 ****
--- 1358,1375 ----
                                {
                                        $$ = makeDefElem("delimiter", (Node 
*)makeString($3));
                                }
+                       | CSV
+                               {
+                                       $$ = makeDefElem("csv", (Node 
*)makeInteger(TRUE));
+                               }
+                       | QUOTE opt_as Sconst
+                               {
+                                       $$ = makeDefElem("quote", (Node 
*)makeString($3));
+                               }
+                       | ESCAPE opt_as Sconst
+                               {
+                                       $$ = makeDefElem("escape", (Node 
*)makeString($3));
+                               }
                        | NULL_P opt_as Sconst
                                {
                                        $$ = makeDefElem("null", (Node 
*)makeString($3));
***************
*** 7420,7425 ****
--- 7434,7440 ----
                        | COPY
                        | CREATEDB
                        | CREATEUSER
+                       | CSV
                        | CURSOR
                        | CYCLE
                        | DATABASE
***************
*** 7507,7512 ****
--- 7522,7528 ----
                        | PRIVILEGES
                        | PROCEDURAL
                        | PROCEDURE
+                       | QUOTE
                        | READ
                        | RECHECK
                        | REINDEX
Index: src/backend/parser/keywords.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/keywords.c,v
retrieving revision 1.147
diff -c -c -r1.147 keywords.c
*** src/backend/parser/keywords.c       11 Mar 2004 01:47:40 -0000      1.147
--- src/backend/parser/keywords.c       13 Apr 2004 04:18:31 -0000
***************
*** 90,95 ****
--- 90,96 ----
        {"createdb", CREATEDB},
        {"createuser", CREATEUSER},
        {"cross", CROSS},
+       {"csv", CSV},
        {"current_date", CURRENT_DATE},
        {"current_time", CURRENT_TIME},
        {"current_timestamp", CURRENT_TIMESTAMP},
***************
*** 248,253 ****
--- 249,255 ----
        {"privileges", PRIVILEGES},
        {"procedural", PROCEDURAL},
        {"procedure", PROCEDURE},
+       {"quote", QUOTE},
        {"read", READ},
        {"real", REAL},
        {"recheck", RECHECK},
Index: src/bin/psql/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/psql/copy.c,v
retrieving revision 1.43
diff -c -c -r1.43 copy.c
*** src/bin/psql/copy.c 12 Apr 2004 15:58:52 -0000      1.43
--- src/bin/psql/copy.c 13 Apr 2004 04:18:32 -0000
***************
*** 66,73 ****
--- 66,76 ----
        bool            from;
        bool            binary;
        bool            oids;
+       bool            csv_mode;
        char       *delim;
        char       *null;
+       char       *quote;
+       char       *escape;
  };
  
  
***************
*** 81,86 ****
--- 84,91 ----
        free(ptr->file);
        free(ptr->delim);
        free(ptr->null);
+       free(ptr->quote);
+       free(ptr->escape);
        free(ptr);
  }
  
***************
*** 277,282 ****
--- 282,291 ----
                        {
                                result->oids = true;
                        }
+                       else if (strcasecmp(token, "csv") == 0)
+                       {
+                               result->csv_mode = true;
+                       }
                        else if (strcasecmp(token, "delimiter") == 0)
                        {
                                token = strtokx(NULL, whitespace, NULL, "'",
***************
*** 301,306 ****
--- 310,339 ----
                                else
                                        goto error;
                        }
+                       else if (strcasecmp(token, "quote") == 0)
+                       {
+                               token = strtokx(NULL, whitespace, NULL, "'",
+                                                               '\\', false, 
pset.encoding);
+                               if (token && strcasecmp(token, "as") == 0)
+                                       token = strtokx(NULL, whitespace, NULL, "'",
+                                                                       '\\', false, 
pset.encoding);
+                               if (token)
+                                       result->quote = pg_strdup(token);
+                               else
+                                       goto error;
+                       }
+                       else if (strcasecmp(token, "escape") == 0)
+                       {
+                               token = strtokx(NULL, whitespace, NULL, "'",
+                                                               '\\', false, 
pset.encoding);
+                               if (token && strcasecmp(token, "as") == 0)
+                                       token = strtokx(NULL, whitespace, NULL, "'",
+                                                                       '\\', false, 
pset.encoding);
+                               if (token)
+                                       result->escape = pg_strdup(token);
+                               else
+                                       goto error;
+                       }
                        else
                                goto error;
  
***************
*** 340,346 ****
        PGresult   *result;
        bool            success;
        struct stat st;
! 
        /* parse options */
        options = parse_slash_copy(args);
  
--- 373,380 ----
        PGresult   *result;
        bool            success;
        struct stat st;
!       bool with_output = false;
!       
        /* parse options */
        options = parse_slash_copy(args);
  
***************
*** 379,390 ****
--- 413,454 ----
                                                          options->delim);
        }
  
+       /* There is no backward-compatible CSV syntax */
        if (options->null)
        {
                if (options->null[0] == '\'')
                        appendPQExpBuffer(&query, " WITH NULL AS %s", options->null);
                else
                        appendPQExpBuffer(&query, " WITH NULL AS '%s'", options->null);
+               with_output = true;
+       }
+ 
+       if (options->csv_mode)
+       {
+               appendPQExpBuffer(&query, " %sCSV ", with_output ? "" : "WITH ");
+               with_output = true;
+       }
+       
+       if (options->quote)
+       {
+               if (options->quote[0] == '\'')
+                       appendPQExpBuffer(&query, " %sQUOTE AS %s",
+                               with_output ? "" : "WITH ", options->quote);
+               else
+                       appendPQExpBuffer(&query, " %sQUOTE AS '%s'",
+                               with_output ? "" : "WITH ", options->quote);
+               with_output = true;
+       }
+ 
+       if (options->escape)
+       {
+               if (options->escape[0] == '\'')
+                       appendPQExpBuffer(&query, " %sESCAPE AS %s",
+                               with_output ? "" : "WITH ", options->escape);
+               else
+                       appendPQExpBuffer(&query, " %sESCAPE AS '%s'",
+                               with_output ? "" : "WITH ", options->escape);
+               with_output = true;
        }
  
        if (options->from)
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to