[SQL] Significance of Database Encoding

2005-05-15 Thread Rajesh Mallah
Hi ,

I would want to know what is the difference between databases
that are created using UNICODE encoding and SQL_ASCII encoding.

I have an existing database that has SQL_ASCII encoding but
still i am able to store multibyte characters that are not
in ASCII character set. for example:

tradein_clients=# \l
  
  List of databases
+-+--+---+
|  Name   |  Owner   | Encoding  |
+-+--+---+
| template0   | postgres | SQL_ASCII |
| template1   | postgres | SQL_ASCII |
| tradein_clients | tradein  | SQL_ASCII |
+-+--+---+

tradein_clients=# SELECT  * from t_A;
+--+
|a  
   |
+--+
| 私はガラス  
  
|
+--+

Above is some japanese character.

I have seen some posting regarding migrating databases from
SQL_ASCII to UNICODE, given the above observation what 
significance does a migration have.

Regards

Rajesh Kumar Mallah.







__ 
Yahoo! Mail Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] Significance of Database Encoding [ update ]

2005-05-15 Thread Rajesh Mallah


I am not sure why the characters did not display properly
in the mailling list archives.

http://archives.postgresql.org/pgsql-sql/2005-05/msg00102.php

but when i do the select in my screen (xterm -u8) i do 
see the japanese glyphs properly.


Regds
Mallah.




--- Rajesh Mallah <[EMAIL PROTECTED]> wrote:
> Hi ,
> 
> I would want to know what is the difference between databases
> that are created using UNICODE encoding and SQL_ASCII encoding.
> 
> I have an existing database that has SQL_ASCII encoding but
> still i am able to store multibyte characters that are not
> in ASCII character set. for example:
> 
> tradein_clients=# \l
>   
>   List of databases
> +-+--+---+
> |  Name   |  Owner   | Encoding  |
> +-+--+---+
> | template0   | postgres | SQL_ASCII |
> | template1   | postgres | SQL_ASCII |
> | tradein_clients | tradein  | SQL_ASCII |
> +-+--+---+
> 
> tradein_clients=# SELECT  * from t_A;
> +--+
> |a
>  |
> +--+
> | 私はガラス
>   
>  
> |
> +--+
> 
> Above is some japanese character.
> 
> I have seen some posting regarding migrating databases from
> SQL_ASCII to UNICODE, given the above observation what 
> significance does a migration have.
> 
> Regards
> 
> Rajesh Kumar Mallah.
> 
> 
> 
> 
> 
> 
>   
> __ 
> Yahoo! Mail Mobile 
> Take Yahoo! Mail with you! Check email on your mobile phone. 
> http://mobile.yahoo.com/learn/mail 
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
> 



Discover Yahoo! 
Find restaurants, movies, travel and more fun for the weekend. Check it out! 
http://discover.yahoo.com/weekend.html 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Significance of Database Encoding

2005-05-15 Thread Rajesh Mallah

--- PFC <[EMAIL PROTECTED]> wrote:
> 
> > +--+
> > | 私はガラス
> > +--+
> 
>   You say it displays correctly in xterm (ie. you didn't see these in 
> your  
> xterm).
>   There are HTML/XML unicode character entities, probably generated by 
> your  
> mailer from your Unicode cut'n'paste.

That is correct.

Now the question is how to convert from SQL_ASCII to UNICODE. 
Mailing lists suggests to run recode or iconv on the dump file
and restore. The problem is on running iconv with -f US-ASCII
the program aborted:

$ iconv -f US-ASCII -t UTF-8  < test.sql > out.sql
iconv: illegal input sequence at position 114500

Any ideas how the job can be accomplised reliably.

Also my database may contain data in multiple encodings
like WINDOWS-1251 and WINDOWS-1256 in various places
as data has been inserted by different peoples using
different sources and client software.


Regds
Rajesh Kumar Mallah.










>   Using SQL ASCII to store UTF8 encoded data will work, but postgres 
> won't  
> know that it's manipulating multibyte characters, so for instance the  
> length of a string will be its Byte length instead of correctly counting  
> the characters, collation rules will be funky, etc. And substring() may  
> well cut in the middle of an UTF8 multibyte char which will then screw  
> your application side processing...
>   Apart from that, it'll work ;)
> 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] Significance of Database Encoding

2005-05-15 Thread Rajesh Mallah

--- PFC <[EMAIL PROTECTED]> wrote:
> 
> > $ iconv -f US-ASCII -t UTF-8  < test.sql > out.sql
> > iconv: illegal input sequence at position 114500
> >
> > Any ideas how the job can be accomplised reliably.
> >
> > Also my database may contain data in multiple encodings
> > like WINDOWS-1251 and WINDOWS-1256 in various places
> > as data has been inserted by different peoples using
> > different sources and client software.
> 
>   You could use a simple program like that (in Python):
> 
> output = open( "unidump", "w" )
> for line in open( "your dump" ):
>   for encoding in "utf-8", "iso-8859-15", "whatever":
>   try:
>   output.write( unicode( line, encoding ).encode( "utf-8" 
> ))
>   break
>   except UnicodeError:
>   pass
>   else:
>   print "No suitable encoding for line..."


This may not work . Becuase ,conversion to utf-8 can be successfull (no runtime 
error)
even for an incorrect guess of the original encoding but the  result will be  
an 
incorrect utf8. 

Regds
Rajesh Kumar Mallah


> 
>   I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a  
> multibit character. Can it ?
> 
>   Or you could do that to all your table using SELECTs but it's going to 
> be  
> painful...
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
> 



__ 
Do you Yahoo!? 
Read only the mail you want - Yahoo! Mail SpamGuard. 
http://promotions.yahoo.com/new_mail 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq