Thanks again, Alvaro!

Alvaro Herrera wrote:

So, our system (CGI's written in C++ running on a Linux server)
simply takes whatever the user gives (properly validated and
escaped) and throws it in the database.  We've never encountered
any problem  (well, or perhaps it's the opposite?  Perhaps we've
always been living with the problem without realizing it?)

The latter, I think. The problem is character recoding. If your old system has been running with encoding SQL_ASCII, then no recoding ever takes place. If you are now using UTF8 or latin1 (say) as server encoding, then as soon as the client is using a different encoding, there should be conversion in order to make the new data correct w.r.t. the server encoding. If the wrong conversion takes place, or if no conversion takes place, you may either end up with invalid data, or have the server reject your input (as was this case.)

This makes sense to me, yes. The reason why I'm a bit lost is that we never did anything whatsoever with respect to encoding. Oddly enough, I couldn't find much about this in the docs.

I see references to it in the runtime configuration docs (the part
where they describe the postgres.conf file).  There's one line,
commented out, where they set (as an example), the client_encoding
to sql_ascii, and a comment to the end of that line says "actually,
it defaults to the server encoding").

I just found out that in the create database statement, one of the
options specifies the encoding using for that database.

I guess what changed from version 7.4.x to 8.0 is that the default
server_encoding changed?

This means that a temporary solution (or rather, a temporary patch)
would be to create the database specifying the right server_encoding
to match what I currently have on my system?  (I wouldn't want to do
that if it is nothing more than a patch to keep hiding the problem)

So the moral of the story seems to be that yes, you need to make each
application issue the correct client_encoding before entering any data.
You can attach it to the user or database, by issuing ALTER USER (resp.
DATABASE). But if you are using a web interface, where the user can
enter data in either win1252 or latin1 encoding (or whatever) depending
on the environment, then I'm not sure what you should do.

This is indeed the case; and "do nothing" is what we have always done with respect to this issue... Why has it been so long without us realizing that there was a hidden problem, I really don't know. (and we do have users with plenty of "weird" characters -- accent aigue, grave, circumflex, tilde, dieresis, etc. -- and they have always worked).

I'm so lost!  :-(

BTW, the correct e-mail to pass through the anti-spam filter is
my first name, followed by a dot, followed by my last name (the
rest after the @ is the same)

Thanks again for your message!

Carlos
--

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to