The SQL_ASCII-breaks-JDBC issue just came up yet again on the JDBC list, and I'm wondering if we can do something better on the server side to help solve it.
The problem is that people have SQL_ASCII databases with non-7-bit data in them under some encoding known only to a (non-JDBC) application. Changing client_encoding has no effect on a SQL_ASCII database, it's always passthrough. So when a JDBC client is later written, and the JDBC driver sets client_encoding=UNICODE, we get data corruption and/or complaints from the driver that the server is sending it invalid unicode (because it's really LATIN1 or whatever the original inserter happened to use). At this point the user has real problems as there is existing data in their database in one or more encodings, but the encoding info associated with that data has been lost. Converting such a database to a single database-wide encoding is painful at best. I suppose that we can't change the semantics of SQL_ASCII without backwards compatibility problems. I wonder if introducing a new encoding that only allows 7-bit ascii, and making that the default, is the way to go. This new encoding would be treated like any other normal encoding, i.e. setting client_encoding does transcoding (I expect that'd be a 1:1 mapping in most or all cases) and rejects unmappable characters as soon as they're encountered. Then the problem is visible as soon as problematic strings are given to the server, rather than when a client that depends on having proper encoding information (such as JDBC) happens to be used. If the DB is only using simple 7-bit ASCII, then there's no change in behaviour. If the DB does need to store additional characters, the user is forced to choose an appropriate encoding before any encoding info is lost. Any thoughts on this? -O ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings