On Mon, Jan 18, 2010 at 2:20 PM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
> 2010/1/18 Robert Haas <robertmh...@gmail.com>:
>> On Mon, Jan 18, 2010 at 1:52 PM, Pavel Stehule <pavel.steh...@gmail.com> 
>> wrote:
>>> 2010/1/18 Robert Haas <robertmh...@gmail.com>:
>>>> On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.steh...@gmail.com> 
>>>> wrote:
>>>>> I rewrote patch so now interface for PQescapeIdentConn is same as
>>>>> PQescapeStringConn
>>>>>
>>>>> @3. I though so the protection under incomplete multibyte chars are
>>>>> enought - missing bytes are replaced by space - like
>>>>> PQescapeStringConn does.
>>>>
>>>> That much is fine, but the output buffer is only guaranteed to be of
>>>> size 2n+1.  Imagine the input is two double-quotes followed by a byte
>>>> for which pg_encoding_mblen() returns 4.  The input is 3 characters
>>>> long so the user was responsible to provide 7 bytes of output space,
>>>> but you'll try to write 9 bytes to it (including the terminating NUL).
>>>>
>>> I don't understand. The "length" is number of bytes, not number of
>>> chars. It is maybe bad documented only. If your input string has 6
>>> bytes, then buffer have to allocated to 13 bytes. Nobody knows how
>>> much is chars there.
>>
>> Right, but the point is we can't assume that the input is validly
>> encoded.  If the input ends with a garbage character that looks like
>> the start of a multi-byte character, we can't assume that there's
>> enough space in the output buffer to store the required number of
>> padding spaces.
>>
>> To take an extreme example, suppose there were an encoding where any
>> time the first byte of a multi-byte character has the high-bit set,
>> the character is 100 bytes long.  Then suppose someone call
>> PQescapeStringConn(), or this new function we're adding, with a length
>> argument of 1, and the first byte of the input buffer has the high-bit
>> set.  The caller is only required to provide a 3-byte output buffer,
>> and the third byte is needed for the terminating NUL.  That means that
>> after we copy that first character we only have room to insert one
>> padding space.  The way you had it coded, since we were expecting a
>> character 100 bytes long, we'd always try to insert 99 padding spaces.
>>
>
> do you speak about previous version?

Yes.

> in current version is garanted new length is <= 2x original length

Actually, strictly less than, but the code gets it correct.  However,
your latest version has some other problems.  For example, you didn't
update the docs to match your source-code changes.  Also, I prefer an
API where the escaping function does include the quotes, so I've done
it that way in the attached patch.  This is just the libpq changes, I
figure if we can agree on this, then we can move onto the psql stuff.

Comments?

...Robert
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
***************
*** 2923,3042 **** typedef struct {
    </sect2>
  
    <sect2 id="libpq-exec-escape-string">
!    <title>Escaping Strings for Inclusion in SQL Commands</title>
  
     <indexterm zone="libpq-exec-escape-string">
-     <primary>PQescapeStringConn</primary>
-    </indexterm>
-    <indexterm zone="libpq-exec-escape-string">
-     <primary>PQescapeString</primary>
-    </indexterm>
-    <indexterm zone="libpq-exec-escape-string">
      <primary>escaping strings</primary>
      <secondary>in libpq</secondary>
     </indexterm>
  
!    <para>
!     <function>PQescapeStringConn</function> escapes a string for use within an SQL
!     command.  This is useful when inserting data values as literal constants
!     in SQL commands.  Certain characters (such as quotes and backslashes) must
!     be escaped to prevent them from being interpreted specially by the SQL parser.
!     <function>PQescapeStringConn</> performs this operation.
!    </para>
  
!    <tip>
!     <para>
!      It is especially important to do proper escaping when handling strings that
!      were received from an untrustworthy source.  Otherwise there is a security
!      risk: you are vulnerable to <quote>SQL injection</> attacks wherein unwanted
!      SQL commands are fed to your database.
!     </para>
!    </tip>
  
!    <para>
!     Note that it is not necessary nor correct to do escaping when a data
!     value is passed as a separate parameter in <function>PQexecParams</> or
!     its sibling routines.
! 
!     <synopsis>
!      size_t PQescapeStringConn (PGconn *conn,
!                                 char *to, const char *from, size_t length,
!                                 int *error);
!     </synopsis>
!    </para>
  
!    <para>
!     <function>PQescapeStringConn</> writes an escaped version of the
!     <parameter>from</> string to the <parameter>to</> buffer, escaping
!     special characters so that they cannot cause any harm, and adding a
!     terminating zero byte.  The single quotes that must surround
!     <productname>PostgreSQL</> string literals are not included in the
!     result string; they should be provided in the SQL command that the
!     result is inserted into.  The parameter <parameter>from</> points to
!     the first character of the string that is to be escaped, and the
!     <parameter>length</> parameter gives the number of bytes in this
!     string.  A terminating zero byte is not required, and should not be
!     counted in <parameter>length</>.  (If a terminating zero byte is found
!     before <parameter>length</> bytes are processed,
!     <function>PQescapeStringConn</> stops at the zero; the behavior is
!     thus rather like <function>strncpy</>.) <parameter>to</> shall point
!     to a buffer that is able to hold at least one more byte than twice
!     the value of <parameter>length</>, otherwise the behavior is undefined.
!     Behavior is likewise undefined if the <parameter>to</> and
!     <parameter>from</> strings overlap.
!    </para>
  
!    <para>
!     If the <parameter>error</> parameter is not NULL, then
!     <literal>*error</> is set to zero on success, nonzero on error.
!     Presently the only possible error conditions involve invalid multibyte
!     encoding in the source string.  The output string is still generated
!     on error, but it can be expected that the server will reject it as
!     malformed.  On error, a suitable message is stored in the
!     <parameter>conn</> object, whether or not <parameter>error</> is NULL.
!    </para>
  
!    <para>
!     <function>PQescapeStringConn</> returns the number of bytes written
!     to <parameter>to</>, not including the terminating zero byte.
!    </para>
  
!    <para>
!     <synopsis>
!      size_t PQescapeString (char *to, const char *from, size_t length);
!     </synopsis>
!    </para>
  
!    <para>
!     <function>PQescapeString</> is an older, deprecated version of
!     <function>PQescapeStringConn</>; the difference is that it does
!     not take <parameter>conn</> or <parameter>error</> parameters.
!     Because of this, it cannot adjust its behavior depending on the
!     connection properties (such as character encoding) and therefore
!     <emphasis>it might give the wrong results</>.  Also, it has no way
!     to report error conditions.
!    </para>
  
!    <para>
!     <function>PQescapeString</> can be used safely in single-threaded
!     client programs that work with only one <productname>PostgreSQL</>
!     connection at a time (in this case it can find out what it needs to
!     know <quote>behind the scenes</>).  In other contexts it is a security
!     hazard and should be avoided in favor of
!     <function>PQescapeStringConn</>.
!    </para>
!   </sect2>
  
  
!   <sect2 id="libpq-exec-escape-bytea">
!    <title>Escaping Binary Strings for Inclusion in SQL Commands</title>
  
!    <indexterm zone="libpq-exec-escape-bytea">
!     <primary>bytea</primary>
!     <secondary sortas="libpq">in libpq</secondary>
!    </indexterm>
  
-    <variablelist>
      <varlistentry>
       <term>
        <function>PQescapeByteaConn</function>
--- 2923,3117 ----
    </sect2>
  
    <sect2 id="libpq-exec-escape-string">
!    <title>Escaping Strings and Identifiers for Inclusion in SQL Commands</title>
  
     <indexterm zone="libpq-exec-escape-string">
      <primary>escaping strings</primary>
      <secondary>in libpq</secondary>
     </indexterm>
  
!    <variablelist>
!     <varlistentry>
!      <term>
!       <function>PQescapeStringConn</function>
!       <indexterm>
!        <primary>PQescapeStringConn</primary>
!       </indexterm>
!      </term>
  
!      <listitem>
!      <para>
!       <function>PQescapeStringConn</function> escapes a string for use within an SQL
!       command.  This is useful when inserting data values as literal constants
!       in SQL commands.  Certain characters (such as quotes and backslashes) must
!       be escaped to prevent them from being interpreted specially by the SQL parser.
!       <function>PQescapeStringConn</> performs this operation.
!      </para>
  
!      <tip>
!       <para>
!        It is especially important to do proper escaping when handling strings that
!        were received from an untrustworthy source.  Otherwise there is a security
!        risk: you are vulnerable to <quote>SQL injection</> attacks wherein unwanted
!        SQL commands are fed to your database.
!       </para>
!      </tip>
  
!      <para>
!       Note that it is not necessary nor correct to do escaping when a data
!       value is passed as a separate parameter in <function>PQexecParams</> or
!       its sibling routines.
  
!       <synopsis>
!        size_t PQescapeStringConn (PGconn *conn,
!                                   char *to, const char *from, size_t length,
!                                   int *error);
!       </synopsis>
!      </para>
  
!      <para>
!       <function>PQescapeStringConn</> writes an escaped version of the
!       <parameter>from</> string to the <parameter>to</> buffer, escaping
!       special characters so that they cannot cause any harm, and adding a
!       terminating zero byte.  The single quotes that must surround
!       <productname>PostgreSQL</> string literals are not included in the
!       result string; they should be provided in the SQL command that the
!       result is inserted into.  The parameter <parameter>from</> points to
!       the first character of the string that is to be escaped, and the
!       <parameter>length</> parameter gives the number of bytes in this
!       string.  A terminating zero byte is not required, and should not be
!       counted in <parameter>length</>.  (If a terminating zero byte is found
!       before <parameter>length</> bytes are processed,
!       <function>PQescapeStringConn</> stops at the zero; the behavior is
!       thus rather like <function>strncpy</>.) <parameter>to</> shall point
!       to a buffer that is able to hold at least one more byte than twice
!       the value of <parameter>length</>, otherwise the behavior is undefined.
!       Behavior is likewise undefined if the <parameter>to</> and
!       <parameter>from</> strings overlap.
!      </para>
  
!      <para>
!       If the <parameter>error</> parameter is not NULL, then
!       <literal>*error</> is set to zero on success, nonzero on error.
!       Presently the only possible error conditions involve invalid multibyte
!       encoding in the source string.  The output string is still generated
!       on error, but it can be expected that the server will reject it as
!       malformed.  On error, a suitable message is stored in the
!       <parameter>conn</> object, whether or not <parameter>error</> is NULL.
!      </para>
  
!      <para>
!       <function>PQescapeStringConn</> returns the number of bytes written
!       to <parameter>to</>, not including the terminating zero byte.
!      </para>
!      </listitem>
!     </varlistentry>
  
!     <varlistentry>
!      <term>
!       <function>PQescapeString</function>
!       <indexterm>
!        <primary>PQescapeString</primary>
!       </indexterm>
!      </term>
! 
!      <listitem>
!      <para>
!       <synopsis>
!        size_t PQescapeString (char *to, const char *from, size_t length);
!       </synopsis>
!      </para>
  
+      <para>
+       <function>PQescapeString</> is an older, deprecated version of
+       <function>PQescapeStringConn</>; the difference is that it does
+       not take <parameter>conn</> or <parameter>error</> parameters.
+       Because of this, it cannot adjust its behavior depending on the
+       connection properties (such as character encoding) and therefore
+       <emphasis>it might give the wrong results</>.  Also, it has no way
+       to report error conditions.
+      </para>
  
!      <para>
!       <function>PQescapeString</> can be used safely in single-threaded
!       client programs that work with only one <productname>PostgreSQL</>
!       connection at a time (in this case it can find out what it needs to
!       know <quote>behind the scenes</>).  In other contexts it is a security
!       hazard and should be avoided in favor of
!       <function>PQescapeStringConn</>.
!      </para>
!      </listitem>
!     </varlistentry>
  
!     <varlistentry>
!      <term>
!       <function>PQescapeIdentifierConn</function>
!       <indexterm>
!        <primary>PQescapeIdentifierConn</primary>
!       </indexterm>
!      </term>
! 
!      <listitem>
!      <para>
!       <function>PQescapeIdentifierConn</function> escapes a string for use as
!       as an SQL identifier, such as a table or column name.  Note that the
!       escaping required for identifiers is different than what is required
!       for string literals, so you must be careful to use the correct function.
!       Strings are surrounded by single quotes, while identifiers are
!       surrounded by double quotes.
!      </para>
! 
!      <tip>
!       <para>
!        As with strings, you must escape identifiers that were received from
!        an untrustworthy source to prevent <quote>SQL injection</> attacks
!        wherein unwanted SQL commands are fed to your database.
!       </para>
!      </tip>
! 
!      <para>
!       <synopsis>
!        size_t PQescapeIdentifierConn (PGconn *conn,
!                                       char *to, const char *from, size_t length,
!                                       int *error);
!       </synopsis>
!      </para>
! 
!      <para>
!       <function>PQescapeIdentifierConn</> writes an escaped version of the
!       <parameter>from</> string to the <parameter>to</> buffer, surrounding
!       the identifier with double quotes, escaping special characters so that
!       they cannot cause any harm, and adding a terminating zero byte.  
!       The parameter <parameter>from</> points to the first character of the
!       string that is to be escaped, and the <parameter>length</> parameter
!       gives the number of bytes in this string.  A terminating zero byte is not
!       required, and should not be counted in <parameter>length</>.  (If a
!       terminating zero byte is found before <parameter>length</> bytes are
!       processed, <function>PQescapeIdentifierConn</> stops at the zero; the
!       behavior is thus rather like <function>strncpy</>.) <parameter>to</>
!       shall point to a buffer that is able to hold at least three more bytes
!       than twice the value of <parameter>length</>, otherwise the behavior is
!       undefined.  Behavior is likewise undefined if the <parameter>to</> and
!       <parameter>from</> strings overlap.
!      </para>
! 
!      <para>
!       If the <parameter>error</> parameter is not NULL, then
!       <literal>*error</> is set to zero on success, nonzero on error.
!       Presently the only possible error conditions involve invalid multibyte
!       encoding in the source string.  The output string is still generated
!       on error, but it can be expected that the server will reject it as
!       malformed.  On error, a suitable message is stored in the
!       <parameter>conn</> object, whether or not <parameter>error</> is NULL.
!      </para>
! 
!      <para>
!       <function>PQescapeIdentifierConn</> returns the number of bytes written
!       to <parameter>to</>, not including the terminating zero byte.
!      </para>
!      </listitem>
!     </varlistentry>
  
      <varlistentry>
       <term>
        <function>PQescapeByteaConn</function>
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***************
*** 153,155 **** PQresultSetInstanceData   150
--- 153,156 ----
  PQfireResultCreateEvents  151
  PQconninfoParse           152
  PQinitOpenSSL             153
+ PQescapeIdentifierConn    154
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***************
*** 3058,3063 **** PQescapeString(char *to, const char *from, size_t length)
--- 3058,3168 ----
  								  static_std_strings);
  }
  
+ /*
+  * Escape an arbitrary string as an SQL identifier.
+  *
+  * This is similar to the backend function quote_identifier(), but we don't
+  * attempt to assess whether quoting is actually needed.  To do that, we'd
+  * need the server's list of keywords, which is not available here (and might
+  * differ depending on the server version).  So we just quote unconditionally.
+  *
+  * This function will up to, but not more than, 2*length+3 bytes to the output
+  * buffer. A terminating NUL character is added to the output string, whether
+  * the input is NUL-terminated or not.
+  *
+  * Returns the actual length of the output (not counting the terminating NUL).
+  */
+ size_t
+ PQescapeIdentifierConn(PGconn *conn, char *to, const char *from,
+ 							  size_t length, int *error)
+ {
+ 	const char *source = from;
+ 	char	   *target = to;
+ 	size_t		remaining = length;
+ 
+ 	if (!conn)
+ 	{
+ 		/* force empty-string result */
+ 		*to = '\0';
+ 		if (error)
+ 			*error = 1;
+ 		return 0;
+ 	}
+ 
+ 	if (error)
+ 		*error = 0;
+ 
+ 	/* Write opening double-quote. */
+ 	*target++ = '"';
+ 
+ 	while (remaining > 0 && *source != '\0')
+ 	{
+ 		char		c = *source;
+ 		int			len;
+ 		int			i;
+ 
+ 		/* Fast path for plain ASCII */
+ 		if (!IS_HIGHBIT_SET(c))
+ 		{
+ 			/* Apply quoting if needed */
+ 			if (c == '"')
+ 				*target++ = c;
+ 			/* Copy the character */
+ 			*target++ = c;
+ 			source++;
+ 			remaining--;
+ 			continue;
+ 		}
+ 
+ 		/* Slow path for possible multibyte characters */
+ 		len = pg_encoding_mblen(conn->client_encoding, source);
+ 
+ 		/* Copy the character */
+ 		for (i = 0; i < len; i++)
+ 		{
+ 			if (remaining == 0 || *source == '\0')
+ 				break;
+ 			*target++ = *source++;
+ 			remaining--;
+ 		}
+ 
+ 		/*
+ 		 * If we hit premature end of string (ie, incomplete multibyte
+ 		 * character), try to pad out to the correct length with spaces. We
+ 		 * may not be able to pad completely, but we will always be able to
+ 		 * insert at least one pad space (since we'd not have quoted a
+ 		 * multibyte character).  This should be enough to make a string that
+ 		 * the server will error out on.
+ 		 */
+ 		if (i < len)
+ 		{
+ 			if (error)
+ 				*error = 1;
+ 			if (conn)
+ 				printfPQExpBuffer(&conn->errorMessage,
+ 						  libpq_gettext("incomplete multibyte character\n"));
+ 			for (; i < len; i++)
+ 			{
+ 				/*
+ 				 * The output buffer must be 2n+3 bytes, but the extra 3 are
+ 				 * reserved for leading and trailing quotes and terminating;
+ 				 * NUL, so we have room for exactly 2 output bytes per input
+ 				 * character.
+ 				 */
+ 				if (((size_t) (target - (to + 1))) / 2 >= length)
+ 					break;
+ 				*target++ = ' ';
+ 			}
+ 			break;
+ 		}
+ 	}
+ 
+ 	/* Write closing double quote and terminating NUL. */
+ 	*target++ = '"';
+ 	*target = '\0';
+ 
+ 	return target - to;
+ }
  
  /* HEX encoding support for bytea */
  static const char hextbl[] = "0123456789abcdef";
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
***************
*** 471,476 **** extern int	PQsetvalue(PGresult *res, int tup_num, int field_num, char *value, in
--- 471,478 ----
  extern size_t PQescapeStringConn(PGconn *conn,
  				   char *to, const char *from, size_t length,
  				   int *error);
+ extern size_t PQescapeIdentifierConn(PGconn *conn, char *to, const char *from,
+ 							  size_t length, int *error);
  extern unsigned char *PQescapeByteaConn(PGconn *conn,
  				  const unsigned char *from, size_t from_length,
  				  size_t *to_length);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to