[SQL] Significance of Database Encoding
Hi , I would want to know what is the difference between databases that are created using UNICODE encoding and SQL_ASCII encoding. I have an existing database that has SQL_ASCII encoding but still i am able to store multibyte characters that are not in ASCII character set. for example: tradein_clients=# \l List of databases +-+--+---+ | Name | Owner | Encoding | +-+--+---+ | template0 | postgres | SQL_ASCII | | template1 | postgres | SQL_ASCII | | tradein_clients | tradein | SQL_ASCII | +-+--+---+ tradein_clients=# SELECT * from t_A; +--+ |a | +--+ | 私はガラス | +--+ Above is some japanese character. I have seen some posting regarding migrating databases from SQL_ASCII to UNICODE, given the above observation what significance does a migration have. Regards Rajesh Kumar Mallah. __ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] Significance of Database Encoding [ update ]
I am not sure why the characters did not display properly in the mailling list archives. http://archives.postgresql.org/pgsql-sql/2005-05/msg00102.php but when i do the select in my screen (xterm -u8) i do see the japanese glyphs properly. Regds Mallah. --- Rajesh Mallah <[EMAIL PROTECTED]> wrote: > Hi , > > I would want to know what is the difference between databases > that are created using UNICODE encoding and SQL_ASCII encoding. > > I have an existing database that has SQL_ASCII encoding but > still i am able to store multibyte characters that are not > in ASCII character set. for example: > > tradein_clients=# \l > > List of databases > +-+--+---+ > | Name | Owner | Encoding | > +-+--+---+ > | template0 | postgres | SQL_ASCII | > | template1 | postgres | SQL_ASCII | > | tradein_clients | tradein | SQL_ASCII | > +-+--+---+ > > tradein_clients=# SELECT * from t_A; > +--+ > |a > | > +--+ > | 私はガラス > > > | > +--+ > > Above is some japanese character. > > I have seen some posting regarding migrating databases from > SQL_ASCII to UNICODE, given the above observation what > significance does a migration have. > > Regards > > Rajesh Kumar Mallah. > > > > > > > > __ > Yahoo! Mail Mobile > Take Yahoo! Mail with you! Check email on your mobile phone. > http://mobile.yahoo.com/learn/mail > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > Discover Yahoo! Find restaurants, movies, travel and more fun for the weekend. Check it out! http://discover.yahoo.com/weekend.html ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Significance of Database Encoding
--- PFC <[EMAIL PROTECTED]> wrote: > > > +--+ > > | 私はガラス > > +--+ > > You say it displays correctly in xterm (ie. you didn't see these in > your > xterm). > There are HTML/XML unicode character entities, probably generated by > your > mailer from your Unicode cut'n'paste. That is correct. Now the question is how to convert from SQL_ASCII to UNICODE. Mailing lists suggests to run recode or iconv on the dump file and restore. The problem is on running iconv with -f US-ASCII the program aborted: $ iconv -f US-ASCII -t UTF-8 < test.sql > out.sql iconv: illegal input sequence at position 114500 Any ideas how the job can be accomplised reliably. Also my database may contain data in multiple encodings like WINDOWS-1251 and WINDOWS-1256 in various places as data has been inserted by different peoples using different sources and client software. Regds Rajesh Kumar Mallah. > Using SQL ASCII to store UTF8 encoded data will work, but postgres > won't > know that it's manipulating multibyte characters, so for instance the > length of a string will be its Byte length instead of correctly counting > the characters, collation rules will be funky, etc. And substring() may > well cut in the middle of an UTF8 multibyte char which will then screw > your application side processing... > Apart from that, it'll work ;) > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Significance of Database Encoding
--- PFC <[EMAIL PROTECTED]> wrote: > > > $ iconv -f US-ASCII -t UTF-8 < test.sql > out.sql > > iconv: illegal input sequence at position 114500 > > > > Any ideas how the job can be accomplised reliably. > > > > Also my database may contain data in multiple encodings > > like WINDOWS-1251 and WINDOWS-1256 in various places > > as data has been inserted by different peoples using > > different sources and client software. > > You could use a simple program like that (in Python): > > output = open( "unidump", "w" ) > for line in open( "your dump" ): > for encoding in "utf-8", "iso-8859-15", "whatever": > try: > output.write( unicode( line, encoding ).encode( "utf-8" > )) > break > except UnicodeError: > pass > else: > print "No suitable encoding for line..." This may not work . Becuase ,conversion to utf-8 can be successfull (no runtime error) even for an incorrect guess of the original encoding but the result will be an incorrect utf8. Regds Rajesh Kumar Mallah > > I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a > multibit character. Can it ? > > Or you could do that to all your table using SELECTs but it's going to > be > painful... > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > __ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq