On Mon, Apr 16, 2018 at 12:09 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Keith Fiske <keith.fi...@crunchydata.com> writes:
> > Running into an issue with helping a client upgrade from 8.3 to 10 (yes,
> I
> > know, please keep the out of support comments to a minimum, thanks :).
>
> > The old database was in SQL_ASCII and it needs to stay that way for now
> > unfortunately. The dump and restore itself works fine, but we're now
> > running into issues with some data returning encoding errors unless we
> > specifically set the client_encoding value to SQL_ASCII.
>
> I'm guessing you might be hitting this 9.1 change:
>
>     * Have psql set the client encoding from the operating system locale
>       by default (Heikki Linnakangas)
>
>       This only happens if the PGCLIENTENCODING environment variable is
>       not set.
>
> I think the previous default was to set client encoding equal to the
> server encoding.
>
> > Looking at the 8.3 database, it has the client_encoding value set to UTF8
> > and queries seem to work fine. Is this just a bug in the old 8.3 not
> > enforcing encoding properly?
>
> Somewhere along the line we made SQL_ASCII -> something else conversions
> check that the data was valid per the other encoding, even though no
> actual data change happens.
>
> > The other thing I noticed on the 10 instance was that, while the LOCALE
> was
> > set to SQL_ASCII,
>
> You mean encoding, I assume.
>
> > the COLLATE and CTYPE values for the restored databases
> > were en_US.UTF-8. Could this be having an affect?
>
> This is not a great idea, no.  You could be getting strange misbehaviors
> in e.g. string comparison, because strcoll() will expect UTF8 data and
> will likely not cope well with data that isn't valid in that encoding.
>
> If you can't sanitize the encoding of your data, I'd suggest running
> with lc_collate and lc_ctype set to "C".
>
>                         regards, tom lane
>


Thanks to both of you Adrian & Tom.

It is the 9.1 change to the psql client that seems to be causing this.

And pg_controldata was able to show that the CTYPE and COLLATE were UTF8 on
the old system. If that's the case, do you still think it's a good idea to
set the COLLATE and CTYPE to "C"?


-- 
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Reply via email to