I wrote:
The minimum-refactoring solution to this would be to tweak
pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but
the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing.
I'm not sure if this would break anything we need to have work,
though. Thoughts? Do we want to back-patch such a change?
I looked through all the callers of pg_do_encoding_conversion(), and
AFAICS this change is a good idea. There are a whole bunch of places
that use pg_do_encoding_conversion() to convert from the database encoding
to encoding X (most usually UTF8), and right now if you do that in a
SQL_ASCII database you have no assurance whatever that what is produced
is actually valid in encoding X. I think we need to close that loophole.
The more I looked into mbutils.c, the less happy I got. The attached
proposed patch takes care of the missing-verification hole in
pg_do_encoding_conversion() and pg_server_to_any(), and also gets rid
of what I believe to be obsolete provisions in pg_do_encoding_conversion
to work if called outside a transaction --- if you consider it working
to completely fail to honor its API contract. That should no longer be
necessary now that we perform client-server encoding conversions via
perform_default_encoding_conversion rather than here.
For testing I inserted an Assert(IsTransactionState()); at the top of
pg_do_encoding_conversion(), and couldn't trigger it, so I'm fairly sure
this change is OK. Still, back-patching it might not be a good thing.
On the other hand, if there is still any code path that can get here
outside a transaction, we probably ought to find out rather than allow
completely bogus data to be returned.
I also made some more-cosmetic improvements, notably removing a bunch
of Asserts that are certainly dead code because the relevant variables
are never NULL.
I've not done anything yet about simplifying unnecessary calls of
pg_do_encoding_conversion into pg_server_to_any/pg_any_to_server.
How much of this is back-patch material, do you think?
regards, tom lane
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 08440e9..7f43cae 100644
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***
*** 1,10
! /*
! * This file contains public functions for conversion between
! * client encoding and server (database) encoding.
*
! * Tatsuo Ishii
*
! * src/backend/utils/mb/mbutils.c
*/
#include postgres.h
--- 1,36
! /*-
*
! * mbutils.c
! * This file contains functions for encoding conversion.
*
! * The string-conversion functions in this file share some API quirks.
! * Note the following:
! *
! * The functions return a palloc'd, null-terminated string if conversion
! * is required. However, if no conversion is performed, the given source
! * string pointer is returned as-is.
! *
! * Although the presence of a length argument means that callers can pass
! * non-null-terminated strings, care is required because the same string
! * will be passed back if no conversion occurs. Such callers *must* check
! * whether result == src and handle that case differently.
! *
! * If the source and destination encodings are the same, the source string
! * is returned without any verification; it's assumed to be valid data.
! * If that might not be the case, the caller is responsible for validating
! * the string using a separate call to pg_verify_mbstr(). Whenever the
! * source and destination encodings are different, the functions ensure that
! * the result is validly encoded according to the destination encoding.
! *
! *
! * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
! * Portions Copyright (c) 1994, Regents of the University of California
! *
! *
! * IDENTIFICATION
! * src/backend/utils/mb/mbutils.c
! *
! *-
*/
#include postgres.h
*** InitializeClientEncoding(void)
*** 290,296
int
pg_get_client_encoding(void)
{
- Assert(ClientEncoding);
return ClientEncoding-encoding;
}
--- 316,321
*** pg_get_client_encoding(void)
*** 300,328
const char *
pg_get_client_encoding_name(void)
{
- Assert(ClientEncoding);
return ClientEncoding-name;
}
/*
! * Apply encoding conversion on src and return it. The encoding
! * conversion function is chosen from the pg_conversion system catalog
! * marked as default. If it is not found in the schema search path,
! * it's taken from pg_catalog schema. If it even is not in the schema,
! * warn and return src.
! *
! * If conversion occurs, a palloc'd null-terminated string is returned.
! * In the case of no conversion, src is returned.
! *
! * CAUTION: although the presence of a length argument means that callers
!