On Tue, Jan 19, 2010 at 4:40 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> I'd like to proceed by committing an initial patch which changes the
>> "Escaping Strings for Inclusion in SQL Commands" to use a
>> <variablelist> with one <varlistentry> per function (as we do in
>> surrounding functions) and consolidates it with the following section,
>>  "Escaping Binary Strings for Inclusion in SQL Commands".  Then I'll
>> submit a patch implementing pqEscapeLiteral() and pqEscapeIdentifier()
>> as discussed here, and the doc diff hunks will actually be readable.
>
> Sounds like a plan.

Initial commit done, and follow-on patch attached.  The docs took
longer to write than the code.  I spent a fair amount of time trying
to make it all make sense, but suggestions are welcome.

...Robert
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
***************
*** 2933,2952 **** typedef struct {
     <variablelist>
      <varlistentry>
       <term>
!       <function>PQescapeStringConn</function>
        <indexterm>
!        <primary>PQescapeStringConn</primary>
        </indexterm>
       </term>
  
       <listitem>
       <para>
!       <function>PQescapeStringConn</function> escapes a string for
        use within an SQL command.  This is useful when inserting data
        values as literal constants in SQL commands.  Certain characters
        (such as quotes and backslashes) must be escaped to prevent them
        from being interpreted specially by the SQL parser.
!       <function>PQescapeStringConn</> performs this operation.
       </para>
  
       <tip>
--- 2933,2975 ----
     <variablelist>
      <varlistentry>
       <term>
!       <function>PQescapeLiteral</function>
        <indexterm>
!        <primary>PQescapeLiteral</primary>
        </indexterm>
       </term>
  
       <listitem>
       <para>
!       <synopsis>
!        size_t PQescapeLiteral(PGconn *conn, char *str, size_t len)
!       </synopsis>
!      </para>
! 
!      <para>
!       <function>PQescapeLiteral</function> escapes a string for
        use within an SQL command.  This is useful when inserting data
        values as literal constants in SQL commands.  Certain characters
        (such as quotes and backslashes) must be escaped to prevent them
        from being interpreted specially by the SQL parser.
!       <function>PQescapeLiteral</> performs this operation.
!      </para>
! 
!      <para>
!       <function>PQescapeLiteral</> returns an escaped version of the
!       <parameter>str</parameter> parameter in memory allocated with
!       <function>malloc()</>.  This memory should be freed using
!       <function>PQfreemem()</> when the result is no longer needed.
!       A terminating zero byte is not required, and should not be
!       counted in <parameter>length</>.  (If a terminating zero byte is found
!       before <parameter>length</> bytes are processed,
!       <function>PQescapeStringConn</> stops at the zero; the behavior is
!       thus rather like <function>strncpy</>.) The
!       return string has all special characters replaced so that they can
!       be properly processed by the <productname>PostgreSQL</productname>
!       string literal parser.  A terminating zero byte is also added.  The
!       single quotes that must surround <productname>PostgreSQL</productname>
!       string literals are included in the result string.
       </para>
  
       <tip>
***************
*** 2963,2969 **** typedef struct {
--- 2986,3055 ----
        Note that it is not necessary nor correct to do escaping when a data
        value is passed as a separate parameter in <function>PQexecParams</> or
        its sibling routines.
+      </para>
+      </listitem>
+     </varlistentry>
+ 
+     <varlistentry>
+      <term>
+       <function>PQescapeIdentifier</function>
+       <indexterm>
+        <primary>PQescapeIdentifier</primary>
+       </indexterm>
+      </term>
+ 
+      <listitem>
+      <para>
+       <synopsis>
+        size_t PQescapeIdentifier(PGconn *conn, char *str, size_t len)
+       </synopsis>
+      </para>
+ 
+      <para>
+       <function>PQescapeIndentifier</function> escapes a string for
+       use as an SQL identifier, such as a table, column, or function name.
+       This is useful when a user-supplied identifier might contain
+       special characters that would otherwise not be interpreted as part
+       of the identifier by the SQL parser, or when the identifier might
+       contain uppercase characters whose case should be preserved.
+      </para>
+ 
+      <para>
+       <function>PQescapeIdentifier</> returns a version of the
+       <parameter>str</parameter> parameter escaped as an SQL identifier
+       in memory allocated with <function>malloc()</>.  This memory must be
+       freed using <function>PQfreemem()</> when the result is no longer
+       needed.  A terminating zero byte is not required, and should not be
+       counted in <parameter>length</>.  (If a terminating zero byte is found
+       before <parameter>length</> bytes are processed,
+       <function>PQescapeStringConn</> stops at the zero; the behavior is
+       thus rather like <function>strncpy</>.) The
+       return string has all special characters replaced so that it
+       will be properly processed as an SQL identifier.  A terminating zero byte
+       is also added.  The return string will also be surrounded by double
+       quotes.
+      </para>
  
+      <tip>
+       <para>
+        As with string literals, to prevent SQL injection attacks,
+        SQL identifiers must be escaped when they are received from an
+        untrustworthy source.
+       </para>
+      </tip>
+      </listitem>
+     </varlistentry>
+ 
+     <varlistentry>
+      <term>
+       <function>PQescapeStringConn</function>
+       <indexterm>
+        <primary>PQescapeStringConn</primary>
+       </indexterm>
+      </term>
+ 
+      <listitem>
+      <para>
        <synopsis>
         size_t PQescapeStringConn (PGconn *conn,
                                    char *to, const char *from, size_t length,
***************
*** 2972,2983 **** typedef struct {
       </para>
  
       <para>
!       <function>PQescapeStringConn</> writes an escaped version of the
!       <parameter>from</> string to the <parameter>to</> buffer, escaping
!       special characters so that they cannot cause any harm, and adding a
!       terminating zero byte.  The single quotes that must surround
!       <productname>PostgreSQL</> string literals are not included in the
!       result string; they should be provided in the SQL command that the
        result is inserted into.  The parameter <parameter>from</> points to
        the first character of the string that is to be escaped, and the
        <parameter>length</> parameter gives the number of bytes in this
--- 3058,3069 ----
       </para>
  
       <para>
!       <function>PQescapeStringConn</> escapes string literals, much like
!       <function>PQescapeLiteral</>.  Unlike <function>PQescapeLiteral</>,
!       the caller is responsible for providing an appropriately sized buffer.
!       Furthermore, <function>PQescapeStringConn</> does not generate the
!       single quotes that must surround <productname>PostgreSQL</> string
!       literals; they should be provided in the SQL command that the
        result is inserted into.  The parameter <parameter>from</> points to
        the first character of the string that is to be escaped, and the
        <parameter>length</> parameter gives the number of bytes in this
***************
*** 3093,3099 **** typedef struct {
        <para>
         <function>PQescapeByteaConn</> returns an escaped version of the
         <parameter>from</parameter> parameter binary string in memory
!        allocated with <function>malloc()</>.  This memory must be freed using
         <function>PQfreemem()</> when the result is no longer needed.  The
         return string has all special characters replaced so that they can
         be properly processed by the <productname>PostgreSQL</productname>
--- 3179,3185 ----
        <para>
         <function>PQescapeByteaConn</> returns an escaped version of the
         <parameter>from</parameter> parameter binary string in memory
!        allocated with <function>malloc()</>.  This memory should be freed using
         <function>PQfreemem()</> when the result is no longer needed.  The
         return string has all special characters replaced so that they can
         be properly processed by the <productname>PostgreSQL</productname>
***************
*** 4285,4291 **** typedef struct {
         be non-<symbol>NULL</symbol>.  <parameter>*buffer</> is set to
         point to the allocated memory, or to <symbol>NULL</symbol> in cases
         where no buffer is returned.  A non-<symbol>NULL</symbol> result
!        buffer must be freed using <function>PQfreemem</> when no longer
         needed.
        </para>
  
--- 4371,4377 ----
         be non-<symbol>NULL</symbol>.  <parameter>*buffer</> is set to
         point to the allocated memory, or to <symbol>NULL</symbol> in cases
         where no buffer is returned.  A non-<symbol>NULL</symbol> result
!        buffer should be freed using <function>PQfreemem</> when no longer
         needed.
        </para>
  
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***************
*** 153,155 **** PQresultSetInstanceData   150
--- 153,157 ----
  PQfireResultCreateEvents  151
  PQconninfoParse           152
  PQinitOpenSSL             153
+ PQescapeLiteral           154
+ PQescapeIdentifier        155
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***************
*** 3059,3064 **** PQescapeString(char *to, const char *from, size_t length)
--- 3059,3202 ----
  }
  
  
+ /*
+  * Escape arbitrary strings.  If as_ident is true, we escape the result
+  * as an identifier; if false, as a literal.  The result is returned in
+  * a newly allocated buffer.  If we fail due to an encoding violation or out
+  * of memory condition, we return NULL, storing an error message into conn.
+  */
+ static char *
+ PQescapeInternal(PGconn *conn, const char *str, size_t len, int as_ident)
+ {
+ 	const char *s;
+ 	char   *result;
+ 	char   *rp;
+ 	int		num_quotes = 0;		/* single or double, depending on as_ident */
+ 	int		num_backslashes = 0;
+ 	int		input_len;
+ 	int		result_size;
+ 	char	quote_char = as_ident ? '"' : '\'';
+ 
+ 	/* We must have a connection, else fail immediately. */
+ 	if (!conn)
+ 		return NULL;
+ 
+ 	/* Scan the string for characters that must be escaped. */
+ 	for (s = str; *s != '\0' && (s - str) < len; ++s)
+ 	{
+ 		if (*s == quote_char)
+ 			++num_quotes;
+ 		else if (*s == '\\')
+ 			++num_backslashes;
+ 		else if (IS_HIGHBIT_SET(*s))
+ 		{
+ 			int charlen;
+ 
+ 			/* Slow path for possible multibyte characters */
+ 			charlen = pg_encoding_mblen(conn->client_encoding, s);
+ 
+ 			/* Multibyte character overruns allowable length. */
+ 			if ((s - str) + charlen > len || memchr(s, 0, charlen) != NULL)
+ 			{
+ 				printfPQExpBuffer(&conn->errorMessage,
+ 						  libpq_gettext("incomplete multibyte character\n"));
+ 				return NULL;
+ 			}
+ 
+ 			/* Adjust s, bearing in mind that for loop will increment it. */
+ 			s += charlen - 1;
+ 		}
+ 	}
+ 
+ 	/* Allocate output buffer. */
+ 	input_len = s - str;
+ 	result_size = input_len + num_quotes + 3;  /* two quotes, plus a NUL */
+ 	if (!as_ident && num_backslashes > 0)
+ 		result_size += num_backslashes + 2;
+ 	result = rp = (char *) malloc(result_size);
+ 	if (rp == NULL)
+ 	{
+ 		printfPQExpBuffer(&conn->errorMessage,
+ 						  libpq_gettext("out of memory\n"));
+ 		return NULL;
+ 	}
+ 
+ 	/*
+ 	 * If we are escaping a literal that contains backslashes, we use the
+ 	 * escape string syntax so that the result is correct under either value
+ 	 * of standard_conforming_strings.  We also emit a leading space in this
+ 	 * case, to guard against the possibility that the result might be
+ 	 * interpolated immediately following an identifier.
+ 	 */
+ 	if (!as_ident && num_backslashes > 0)
+ 	{
+ 		*rp++ = ' ';
+ 		*rp++ = 'E';
+ 	}
+ 
+ 	/* Opening quote. */
+ 	*rp++ = quote_char;
+ 
+ 	/*
+ 	 * Use fast path if possible.
+ 	 *
+ 	 * We've already verified that the input string is well-formed in the
+ 	 * current encoding.  If it contains no quotes and, in the case of
+ 	 * literal-escaping, no backslashes, then we can just copy it directly
+ 	 * to the output buffer, adding the necessary quotes.
+ 	 *
+ 	 * If not, we must rescan the input and process each character
+ 	 * individually.
+ 	 */
+ 	if (num_quotes == 0 && (num_backslashes == 0 || as_ident))
+ 	{
+ 		memcpy(rp, str, input_len);
+ 		rp += input_len;
+ 	}
+ 	else
+ 	{
+ 		for (s = str; s - str < input_len; ++s)
+ 		{
+ 			if (*s == quote_char || (!as_ident && *s == '\\'))
+ 			{
+ 				*rp++ = *s;
+ 				*rp++ = *s;
+ 			}
+ 			else if (!IS_HIGHBIT_SET(*s))
+ 				*rp++ = *s;
+ 			else
+ 			{
+ 				int i = pg_encoding_mblen(conn->client_encoding, s);
+ 				while (1)
+ 				{
+ 					*rp++ = *s;
+ 					if (--i == 0)
+ 						break;
+ 					++s;	/* for loop will provide the final increment */
+ 				}
+ 			}
+ 		}
+ 	}
+ 
+ 	/* Closing quote and terminating NUL. */
+ 	*rp++ = quote_char;
+ 	*rp = '\0';
+ 
+ 	return result;
+ }
+ 
+ char *
+ PQescapeLiteral(PGconn *conn, const char *str, size_t len)
+ {
+ 	return PQescapeInternal(conn, str, len, 0);
+ }
+ 
+ char *
+ PQescapeIdentifier(PGconn *conn, const char *str, size_t len)
+ {
+ 	return PQescapeInternal(conn, str, len, 1);
+ }
+ 
  /* HEX encoding support for bytea */
  static const char hextbl[] = "0123456789abcdef";
  
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
***************
*** 471,476 **** extern int	PQsetvalue(PGresult *res, int tup_num, int field_num, char *value, in
--- 471,478 ----
  extern size_t PQescapeStringConn(PGconn *conn,
  				   char *to, const char *from, size_t length,
  				   int *error);
+ extern char *PQescapeLiteral(PGconn *conn, const char *str, size_t len);
+ extern char *PQescapeIdentifier(PGconn *conn, const char *str, size_t len);
  extern unsigned char *PQescapeByteaConn(PGconn *conn,
  				  const unsigned char *from, size_t from_length,
  				  size_t *to_length);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to