Re: [postgis-users] character encoding problems

2011-11-30 Thread Mark Cave-Ayland

On 30/11/11 02:24, Clay, Bruce wrote:


I trying to learn more about natural language processing and language
translation
I have installed the English version of WordNet in Postgres without any
problems. I downloaded dictionaries from a varity of site such as are
used in OpenOffice / WinEdt.
When I try to build a table from several of the different languages I
get the following error
ERROR: invalid byte sequence for encoding "UTF8": 0x82
I checked the encoding and it is indeed set up for Unicode-8. I tried to
create databases using a variety of other encoding types such as WIN1252
and others and I got the same error message from all of them except
SQL_ASCII.
When I created the database using SQL_ASCII I recieved the warning that
the database could only store 7 bit data. When I loaded the data in this
database I did not have any errors and when I look at the data it seems
to be the same as in the original text file.
Is there a "proper" encoding type that I should use to load the word
lists so they can interoperate with the WordNet dataset that happily
uses the UTF8 encoding?
Bruce


Hi Bruce,

This isn't strictly a PostGIS question, so you'd be better off 
re-posting to the pgsql-general mailing list to get some answers. 
However, from what you mention above it seems that the extra 
dictionaries you are downloading are not in UTF8 encoding and so may 
require conversion upon import.


You can potentially use SQL_ASCII as a workaround, but I would highly 
recommend that you don't do this, since then you end up with data in a 
mixture of random encodings that you will never be able to output 
correctly across all platforms.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] character encoding problems

2011-11-29 Thread Clay, Bruce
I trying to learn more about natural language processing and language 
translation
 
I have installed the English version of WordNet in Postgres without any 
problems.  I downloaded dictionaries from a varity of site such as are used in 
OpenOffice / WinEdt.
 
When I try to build a table from several of the different languages I get the 
following error
 
ERROR:  invalid byte sequence for encoding "UTF8": 0x82
 
I checked the encoding and it is indeed set up for Unicode-8.  I tried to 
create databases using a variety of other encoding types such as WIN1252 and 
others and I got the same error message from all of them except SQL_ASCII.
 
When I created the database using SQL_ASCII I recieved the warning that the 
database could only store 7 bit data.  When I loaded the data in this database 
I did not have any errors and when I look at the data it seems to be the same 
as in the original text file.
 
Is there a "proper" encoding type that I should use to load the word lists so 
they can interoperate with the WordNet dataset that happily uses the UTF8 
encoding?
 
Bruce
 



This message and any enclosures are intended only for the addressee.  Please  
notify the sender by email if you are not the intended recipient.  If you are  
not the intended recipient, you may not use, copy, disclose, or distribute this 
 
message or its contents or enclosures to any other person and any such actions  
may be unlawful.  Ball reserves the right to monitor and review all messages  
and enclosures sent to or from this email address.___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users