Thanks for the detailed reply, you've confirmed what I suspected. :)

I guess I have some work to do!


On Fri, 2004-10-29 at 10:19, J. Michael Crawford wrote:
>    In my experience, there are just some characters that don't want to be 
> converted, even if they appear to be part of the normal 8-bit character 
> system.  We went to Unicode databases to hold our Latin1 characters because 
> of this.  There was even a case where the client was cutting and pasting 
> ascii text into our database, and it just wouldn't take some of the 
> letters, giving the same error you reported.
> 
>    I'm going to send a more detailed post on the topic, but in general, 
> we've found that there are four things that need to be done (four, if 
> you're not serving up web pages) for Latin1 characters to work on multiple 
> platforms.
> 
>    1.  Create the database in Unicode so that it will hold anything you 
> throw at it.
> 
>    2.  When importing data, set the encoding in the script that loads the 
> data, or if there's no script, use the "SET CLIENT_ENCODING TO (encoding)" 
> command.  Setting the encoding in a tool like pgManager is not always 
> enough.  Use this to be sure.
> 
>    3.  When retrieving data in a java application, the JVM encoding will 
> vary from JVM to JVM, and no attempt on our part to change the JVM encoding 
> or translate the encoding of the database strings has worked, either to or 
> from the database.  We spent weeks going through every permutation 
> getBytes("ISO-8859-1") and related calls we could find, but to no 
> avail.  The JVM will tell you it has a new encoding, but postgres will 
> return gibberish.  You can translate the bytes, or get a translated string, 
> but it's all the same garbage.  The solution: set the client encoding 
> manually through a jdbc prepared statement.  Once you set the client 
> encoding properly, all seems to be fine:
> 
> String DBEncoding = "anEncoding"  //use a real encoding, either returned 
> from the jvm or explicitly stated
> PreparedStatement statement = dbCon.prepareStatement("SET CLIENT_ENCODING 
> TO '" + DBEncoding + "'");
> statement.execute();
> 
>    4.  If writing html for a web page, make sure the encoding of the web 
> page matches the encoding of the strings you're throwing at it.  So if you 
> have a Linux JVM that has a "UTF-8" encoding, the web page will need the 
> html equivalent:
> 
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
> 
> ---
> 
>    This is likely far more information than you require, but I thought I'd 
> add it anyway so that the information is in the archives.  It took us 
> months to solve our problem, even with help from the postgres community, so 
> I at least want the basics to be posted while I get my act together and 
> write something with more detail.
> 
>       - Mike
> 
> 
> At 12:12 PM 10/29/2004, Cott Lang wrote:
>  >ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1
>  >
>  >Running 7.4.5, I frequently get this error, and ONLY on this particular
>  >character despite seeing quite a bit of 8 bit. I don't really follow why
>  >it can't be converted, it's the same character (239) in both character
>  >sets. Databases are in ISO8859-1, JDBC driver is defaulting to UTF-8.
>  >
>  >Am I flubbing something up?  I'm probably going to (reluctantly) convert
>  >to UTF-8 in the database at some point, but it'd sure be nice if this
>  >worked without that. :)
>  >
>  >thanks!
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >---------------------------(end of broadcast)---------------------------
>  >TIP 8: explain analyze is your friend
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faqs/FAQ.html


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to