[GENERAL] Storing double-byte strings in text fields.

2001-02-16 Thread edmund

Hello,

I am putting together a web site to display a collection of Chinese
woodblock prints. I want to be able to store double byte values (that is
to say Big5, Unicode etc encoded) in a text field for things such as the
artist's name and the title of the print. I have the following questions:

Is this possible using a plain vanilla version of Postgres, ie without the
multi-lingual support enabled? As I understand it multi-lingual support
allows me to store table and field names etc in non-ASCII, but doesn't
really affect what goes into the fields.

Are programs such as pgdump and the COPY method 8bit clean or will they
mess up the text? I have done some quick trials and it all seems OK but I
want to be sure before commiting.

If the above is not the case will the multi-lingual support fix my
problems? I tried it out but had problems with the backend crashing on
certain queries. I'd also rather not use it as it will be easier to port
my system to other servers if it just needs a plain vanilla install.

I am currently using Postgresql 7.0.3 on RedHat 6.2 (x86) and also on
YellowDog 1.2 (PPC). The web server is Apache 1.3.12 with PHP 4.0.x.


Thanks,

Edmund.


--   
*** *** 
Edmund von der Burg ***   [EMAIL PROTECTED]   *** 
***



Re: [GENERAL] Storing double-byte strings in text fields.

2001-02-16 Thread Tatsuo Ishii

 I am putting together a web site to display a collection of Chinese
 woodblock prints. I want to be able to store double byte values (that is
 to say Big5, Unicode etc encoded) in a text field for things such as the
 artist's name and the title of the print. I have the following questions:
 
 Is this possible using a plain vanilla version of Postgres, ie without the
 multi-lingual support enabled? As I understand it multi-lingual support
 allows me to store table and field names etc in non-ASCII, but doesn't
 really affect what goes into the fields.

As already Tom mentioned, your RPMS based Linux boxes already have
PostgreSQL multi-byte capability enabled.

 Are programs such as pgdump and the COPY method 8bit clean or will they
 mess up the text? I have done some quick trials and it all seems OK but I
 want to be sure before commiting.

I don't see any reason that copy or pg_dump is not 8bit clean.

 If the above is not the case will the multi-lingual support fix my
 problems? I tried it out but had problems with the backend crashing on
 certain queries. I'd also rather not use it as it will be easier to port
 my system to other servers if it just needs a plain vanilla install.

You said you use Big5. That might be the problem. PostgreSQL does not
accept any encoding conficting with ASCII. Certain Big5 characters
include such that second bytes in the ASCII range. In this case you
need to create a database with EUC_TW encoding and set the environment
varible "PGCLIENTENCODING" to BIG5 in your frontend. This will force
the backend to convert Big5 -- EUC_TW automatically. Oh, you use
PHP4?  then you need to set the environment varible before starting up
Apache if you use PHP4 as a module. Also I suspect you might have
trouble with PHP4. It has a capability called "magic quote", that adds
an escape character (\) to the second byte of Big5 if it's a meta
character. You need to disable it otherwise PostgreSQL will be
confused. In summary you must be very carefull to use Big5 especially
with PHP.

Talking about Unicode, it is safe as long as UTF-8 encoding. UCS-2/4
cannot be used with PostgreSQL. PostgreSQL 7.1 will have the ability
to do an automatic code conversion between UTF-8 and other encodings
including Big5. This might be a good news for you.

Another problems I have seen so far with chinese character sets are
sometimes data produced by chinese applications are badly
broken. Actually PostgreSQL is not so robust against such broken
multi-byte strings. I suspect this may be the reason of the backend
crash you had if above are not apply. I don't know.

 I am currently using Postgresql 7.0.3 on RedHat 6.2 (x86) and also on
 YellowDog 1.2 (PPC). The web server is Apache 1.3.12 with PHP 4.0.x.
--
Tatsuo Ishii