Re: [GENERAL] invalid byte sequence for encoding UTF8: 0x00

2011-11-20 Thread hubert depesz lubaczewski
On Sat, Nov 19, 2011 at 09:32:12AM -0800, pawel_kukawski wrote:
 Is there any way I can store NULL character (\u) in string ?
 
 Or there is only one option that I have change every text field to bytea.

correct question is: why do you want to store \u in text field?

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] invalid byte sequence for encoding UTF8: 0x00

2011-11-19 Thread pawel_kukawski
Hi,

Is there any way I can store NULL character (\u) in string ?

Or there is only one option that I have change every text field to bytea.

Regards,
Paweł

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp5007173p5007173.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

2011-06-17 Thread Albe Laurenz
BRUSSER Michael wrote:
 Is there a way to find the records with the text field containing
Unicode bytes 0xedbebf?
 Unfortunately this is a very old version 7.3.10

 This should work on 7.3 (according to the documentation):
 SELECT id FROM nlsdata WHERE position('\360\235\204\236'::bytea IN
val::bytea) = 1;

 Albe, thanks for pointing this out!

 I made a minor change, added decode since text cannot be cast to bytea
and tried something like this:
  SELECT id FROM myTable WHERE position('\360\235\204\236'::bytea IN
decode(myTextField, 'escape')) !=0
  ERROR:  decode: Bad input string for type bytea

Hrm. I didn't know that there was no cast from text to bytea in 7.3.

 Maybe this explains why?
 testdb=# select decode('\360\235\204\236'::text, 'escape');
 ERROR:  Unicode = 0x1 is not supported

No, that is an error on my side. I gave you the wrong byte sequence.

For 0xedbebf you should actually write '\355\276\277'. But that's no
valid UTF-8 sequence.

 but I'm not ready to give up yet...

If you know the byte sequence that causes trouble, you could also use
something like sed
to search and replace it in the dump file.

Or (if there are not too many) you could search for the pattern and
identify the rows
in the database. Then you know which database rows to update.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

2011-06-16 Thread Albe Laurenz
BRUSSER Michael wrote:
 Is there a way to find the records with the text field containing
Unicode bytes 0xedbebf?

 Unfortunately this is a very old version 7.3.10

This should work on 7.3 (according to the documentation):

SELECT id FROM nlsdata WHERE position('\360\235\204\236'::bytea IN
val::bytea) = 1;

Here val is the column containing the values in question.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

2011-06-16 Thread BRUSSER Michael
-Original Message-
From: Albe Laurenz [mailto:laurenz.a...@wien.gv.at]
Sent: Thursday, June 16, 2011 5:16 AM
To: BRUSSER Michael; pgsql-general@postgresql.org
Subject: RE: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

BRUSSER Michael wrote:
 Is there a way to find the records with the text field containing Unicode 
 bytes 0xedbebf?
 Unfortunately this is a very old version 7.3.10

This should work on 7.3 (according to the documentation):
SELECT id FROM nlsdata WHERE position('\360\235\204\236'::bytea IN val::bytea) 
= 1;



Albe, thanks for pointing this out!

I made a minor change, added decode since text cannot be cast to bytea and 
tried something like this:
  SELECT id FROM myTable WHERE position('\360\235\204\236'::bytea IN 
decode(myTextField, 'escape')) != 0
  ERROR:  decode: Bad input string for type bytea

If I limit query to some healthy records -  AND id between 100 and 110 it 
works and returns empty result.
So the problem now is that without decode myTextField cannot be converted to 
bytea, with decode it breaks on the first 'bad' value.
Maybe this explains why?
testdb=# select decode('\360\235\204\236'::text, 'escape');
ERROR:  Unicode = 0x1 is not supported

Another thought is that if I get this to work I may need to search for anything 
outside of the standard utf range,
rather than any specific sequence. I am beginning to understand why many people 
dealt with this in the dump file,
but I'm not ready to give up yet...

As usual, any ideas are appreciated!
Thanks.


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

2011-06-15 Thread BRUSSER Michael
This is a follow-up on my previous message 
http://archives.postgresql.org/pgsql-general/2011-06/msg00054.php

I think I have now some understanding of what's causing the problem, but I 
don't have a good solution, instead more questions.
The release notes for v8.1 at 
http://www.postgresql.org/docs/current/interactive/release-8-1.html
make a good suggestion on using iconv to convert the plain-text dump file into 
utf8
On Linux this did not work, the input and output file were identical. The iconv 
on Solaris  refused to open the input file
(probably too big),  although it worked with a chunk of it and reported 
conversion error.

Unless there's no other options I don't want to use sed or break file into 
pieces, if possible, I would prefer to identify the bad records on the database.
I tried SELECT with everything  I could think of:  ~*, SIMILAR TO, and the 
likes of them, but I never got it right.

Is there a way to find the records with the text field containing Unicode bytes 
0xedbebf?
Unfortunately this is a very old version 7.3.10

Thank you.
Michael.

This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer


Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

2011-06-15 Thread Alan Hodgson
On June 15, 2011 01:18:27 PM BRUSSER Michael wrote:
 Unless there's no other options I don't want to use sed or break file into
 pieces, if possible,

iconv loads everything into RAM. You can use split, convert the pieces, and 
then recombine, I did that when converting a large database to utf-8 and it 
worked.

-- 
Obama has now fired more cruise missiles than all other Nobel Peace prize 
winners combined.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

2011-06-15 Thread BRUSSER Michael
-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Alan Hodgson
Sent: Wednesday, June 15, 2011 5:37 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf

On June 15, 2011 01:18:27 PM BRUSSER Michael wrote:
 Unless there's no other options I don't want to use sed or break file into
 pieces, if possible,

--

 iconv loads everything into RAM. You can use split, convert the pieces, and
 then recombine, I did that when converting a large database to utf-8 and it
 worked.

-

- Thanks, but this is exactly what I am trying to avoid!
Using split is good if you have one database to upgrade and no external 
customers.
(Not to mention other problems with this approach)

This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] invalid byte sequence for encoding UTF8

2011-06-02 Thread BRUSSER Michael
We upgrading some old  database (7.3.10 to 8.4.4).   This involves running 
pg_dump on the old db
and loading the datafile to the new db.  If this matters we do not use 
pg_restore, the dump file is just sourced with psql,
and this is where I ran into problem:

psql: .../postgresql_archive.src/... ERROR:  invalid byte sequence for encoding 
UTF8: 0xedbebf
HINT:  This error can also happen if the byte sequence does not match the 
encoding
expected by the server, which is controlled by client_encoding.

The server and client encoding are both Unicode. I think we may have some 
copy/paste MS-Word markup
and possibly other odd things  on the old database.  All this junk is found on 
the 'text' fields.

I found a number of related postings, but did not see a good solution.  Some 
folks suggested cleaning the datafile prior to loading,
while someone else did essentially the same thing on the database before 
dumping it.
I am looking for advice, hopefully the best technique if there is one,   any 
suggestion is appreciated.

Thanks,
Michael.


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer


Re: [GENERAL] invalid byte sequence for encoding UTF8

2011-06-02 Thread Derrick Rice
That specific character sequence is a result of Unicode implementations
prior to 6.0 mixing with later implementations.  See here:

http://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character

You could replace that sequence with the correct 0xFFFD sequence with `sed`
for example (if using a plaintext dump format).

On Thu, Jun 2, 2011 at 4:17 PM, BRUSSER Michael michael.brus...@3ds.comwrote:

  We upgrading some old  database (7.3.10 to 8.4.4).   This involves
 running pg_dump on the old db

 and loading the datafile to the new db.  If this matters we do not use
 pg_restore, the dump file is just sourced with psql,

 and this is where I ran into problem:



 psql: .../postgresql_archive.src/... ERROR:  invalid byte sequence for
 encoding UTF8: 0xedbebf

 HINT:  This error can also happen if the byte sequence does not match the
 encoding

 expected by the server, which is controlled by client_encoding.



 The server and client encoding are both Unicode. I think we may have some
 copy/paste MS-Word markup

 and possibly other odd things  on the old database.  All this junk is found
 on the ‘text’ fields.



 I found a number of related postings, but did not see a good solution.
 Some folks suggested cleaning the datafile prior to loading,

 while someone else did essentially the same thing on the database before
 dumping it.

 I am looking for advice, hopefully the “best technique” if there is one,
   any suggestion is appreciated.



 Thanks,

 Michael.



 This email and any attachments are intended solely for the use of the
 individual or entity to whom it is addressed and may be confidential and/or
 privileged.

 If you are not one of the named recipients or have received this email in
 error,

 (i) you should not read, disclose, or copy it,

 (ii) please notify sender of your receipt by reply email and delete this
 email and all attachments,

 (iii) Dassault Systemes does not accept or assume any liability or
 responsibility for any use of or reliance on this email.

  For other languages, go to http://www.3ds.com/terms/email-disclaimer



Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xf1612220

2011-05-12 Thread Cédric Villemain
2011/5/12 Craig Ringer cr...@postnewspapers.com.au:
 On 05/11/2011 03:16 PM, AI Rumman wrote:

 I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3
 and getting the following error:

 pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE
 DATA originaldata postgres
 pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
 for encoding UTF8: 0xf1612220
 HINT:  This error can also happen if the byte sequence does not match
 the encoding expected by the server, which is controlled by
 client_encoding.
 CONTEXT:  COPY wi_originaldata, line 3592

 I took a dump from 8.2 server and then tried to restore at 8.3.

 Both the client_encoding and server_encoding are UTF8 at both the servers.

 Newer versions of Pg got better at caching bad unicode. While this helps
 prevent bad data getting into the database, it's a right pain if you're
 moving data over from an older version with less strict checks.

 I don't know of any way to relax the checks for the purpose of importing
 dumps. You'll need to fix your dump files before loading them (by finding
 the faulty text and fixing it) or fix it in the origin database before
 migrating the data. Neither approach is nice or easy, but nobody has yet
 stepped up to write a unicode verifier tool that checks old databases' text
 fields against stricter rules...


The 2 following articles have SQL functions and documentation you may
find useful:

http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_1.html
http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_2.html



 --
 Craig Ringer

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] invalid byte sequence for encoding UTF8: 0xf1612220

2011-05-11 Thread AI Rumman
I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3 and
getting the following error:

pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE DATA
originaldata postgres
pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence for
encoding UTF8: 0xf1612220
HINT:  This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by client_encoding.
CONTEXT:  COPY wi_originaldata, line 3592

I took a dump from 8.2 server and then tried to restore at 8.3.

Both the client_encoding and server_encoding are UTF8 at both the servers.

Table public.data
   Column|  Type
|   Modifiers
-++
 orgid   | integer|
 id  | integer| not null default
nextval(('data'::text)::regclass)
 datatypecode| character varying(15)  |
 batchname   | character varying(60)  |
 filename| character varying(60)  |
 encoding| character varying(20)  |
 errormessage| character varying(255) |
 originaldata_backup | bytea  |
 processeddata   | bytea  |
 validatedflag   | smallint   |
 processedflag   | smallint   |
 createddate | date   |
 createdtime | time without time zone |
 modifieddate| date   |
 modifiedtime| time without time zone |
 processeddate   | date   |
 processedtime   | time without time zone |
 deletedflag | smallint   |
 originaldata| text   |
Indexes:
data_pkey PRIMARY KEY, btree (id)

Any help will be appreciable.


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xf1612220

2011-05-11 Thread Craig Ringer

On 05/11/2011 03:16 PM, AI Rumman wrote:

I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3
and getting the following error:

pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE
DATA originaldata postgres
pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
for encoding UTF8: 0xf1612220
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
client_encoding.
CONTEXT:  COPY wi_originaldata, line 3592

I took a dump from 8.2 server and then tried to restore at 8.3.

Both the client_encoding and server_encoding are UTF8 at both the servers.


Newer versions of Pg got better at caching bad unicode. While this helps 
prevent bad data getting into the database, it's a right pain if you're 
moving data over from an older version with less strict checks.


I don't know of any way to relax the checks for the purpose of importing 
dumps. You'll need to fix your dump files before loading them (by 
finding the faulty text and fixing it) or fix it in the origin database 
before migrating the data. Neither approach is nice or easy, but nobody 
has yet stepped up to write a unicode verifier tool that checks old 
databases' text fields against stricter rules...


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab

2009-06-08 Thread Albe Laurenz
Mark D. Grand wrote:
 I am having a vexing problem with a script I am writing to 
 populate reference tables in a new database. 
 
 I am running postgreSQL 8.3 with psql 8.3.7.
 
 Psql reads this SQL statement:
 
 INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, 
 META_ASSERTION)
 VALUES ('Super-User Authorization', 
 'This allows a super-user to administer all meta-data.', 
 'UserID «Administer» ()');
 
 and I get this message:
 
 ERROR:  invalid byte sequence for encoding UTF8: 0xab
 
 HINT:  This error can also happen if the byte sequence does 
 not match the encoding expected by the server, which is 
 controlled by client_encoding.
 
 It is complaining about the '«' character.  I do not 
 understand why.  The database is created the commands
 
 CREATE DATABASE mayyou
 WITH OWNER=meta_auth ENCODING='UTF8';
 
 ALTER DATABASE mayyou SET client_encoding = 'UTF8';
 
 When I give psql the \encoding command, it replies
 UTF8
 
 Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo « | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab

2009-06-08 Thread Grand, Mark D.
It turns out that my problem was that the editor I was using (emacs) does not 
properly support utf8 encoding.

-Original Message-
From: Albe Laurenz [mailto:laurenz.a...@wien.gv.at]
Sent: Monday, June 08, 2009 5:59 AM
To: Grand, Mark D.; pgsql-general@postgresql.org
Subject: RE: [GENERAL] invalid byte sequence for encoding UTF8: 0xab

Mark D. Grand wrote:
 I am having a vexing problem with a script I am writing to
 populate reference tables in a new database.

 I am running postgreSQL 8.3 with psql 8.3.7.

 Psql reads this SQL statement:

 INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, 
 META_ASSERTION)
 VALUES ('Super-User Authorization',
 'This allows a super-user to administer all meta-data.',
 'UserID Administer ()');

 and I get this message:

 ERROR:  invalid byte sequence for encoding UTF8: 0xab

 HINT:  This error can also happen if the byte sequence does
 not match the encoding expected by the server, which is
 controlled by client_encoding.

 It is complaining about the '' character.  I do not
 understand why.  The database is created the commands

 CREATE DATABASE mayyou
 WITH OWNER=meta_auth ENCODING='UTF8';

 ALTER DATABASE mayyou SET client_encoding = 'UTF8';

 When I give psql the \encoding command, it replies
 UTF8

 Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo  | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab

2009-06-08 Thread Dimitri Fontaine
Grand, Mark D. mgr...@emory.edu writes:

 It turns out that my problem was that the editor I was using (emacs)
 does not properly support utf8 encoding.

Emacs does support utf8 properly.
  http://www.emacswiki.org/emacs/ChangingEncodings

It could be I'm biased because I use emacs from CVS, which is going to
be emacs23, and is as stable as emacs has always been for me.
  http://emacs.orebokech.com/
  http://atomized.org/wp-content/cocoa-emacs-nightly/

From within emacs, to get a ton of information about char under point,
try C-x = (one line version) or M-x describe-char (full version): 
 Char:  (60, #o74, #x3c) point=1312 of 4162 (31%) 301-4163 column=66

character:  (60, #o74, #x3c)
preferred charset: ascii (ASCII (ISO646 IRV))
   code point: 0x3C
   syntax: .which means: punctuation
 category: .:Base, a:ASCII, l:Latin, r:Roman
  buffer code: #x3C
file code: #x3C (encoded by coding system utf-8-emacs)
  display: by this font (glyph code)
xft:-bitstream-Bitstream Vera Sans 
Mono-normal-normal-normal-*-16-*-*-*-m-0-iso10646-1 (#x1F)


But I guess we're off topic now.

HTH, regards,
-- 
dim

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] invalid byte sequence for encoding UTF8: 0xab

2009-06-05 Thread Grand, Mark D.
I am having a vexing problem with a script I am writing to populate reference 
tables in a new database.

I am running postgreSQL 8.3 with psql 8.3.7.
Psql reads this SQL statement:
INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, 
META_ASSERTION)
VALUES ('Super-User Authorization',
'This allows a super-user to administer all meta-data.',
'UserID Administer ()');

and I get this message:
ERROR:  invalid byte sequence for encoding UTF8: 0xab
HINT:  This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by client_encoding.

It is complaining about the '' character.  I do not understand why.  The 
database is created the commands
CREATE DATABASE mayyou
WITH OWNER=meta_auth ENCODING='UTF8';
ALTER DATABASE mayyou SET client_encoding = 'UTF8';

When I give psql the \encoding command, it replies
UTF8

Why is it complaining about this valid character code?



This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab

2009-06-05 Thread Tom Lane
Grand, Mark D. mgr...@emory.edu writes:
 ... I get this message:
 ERROR:  invalid byte sequence for encoding UTF8: 0xab
 HINT:  This error can also happen if the byte sequence does not match the 
 encoding expected by the server, which is controlled by client_encoding.

 It is complaining about the '' character.  I do not understand why.

The ASCII code for '' is 0x3c, not 0xab.  I am not sure what you are
actually typing; although it's suggestive that the LATIN1 code 0xab
corresponds to a symbol that looks approximately like ''.  The most
likely bet is that you are typing the wrong thing and using a terminal
emulator that is not set to generate UTF8-encoded characters.  You
should try to make sure that client_encoding is set to match what your
keyboard actually generates.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab

2009-06-05 Thread Vick Khera
On Fri, Jun 5, 2009 at 9:57 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 The ASCII code for '' is 0x3c, not 0xab.  I am not sure what you are
 actually typing; although it's suggestive that the LATIN1 code 0xab
 corresponds to a symbol that looks approximately like ''.  The most
 likely bet is that you are typing the wrong thing and using a terminal

Must be something with your mail program, because in the version I am
reading postgres is complaining about the approximately like ''
symbol.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-11-30 Thread Albe Laurenz
Glyn Astill wrote:
 I've setup a postgres 8.2 server and have a database setup with UTF8
 encoding. I intend to read some of our legacy data into the table,
 this legacy data is in ASCII format, and as far as I know is 8 bit
 ASCII.
 
 We have a migration tool from mertechdata.com to convert these files
 that are in a DataFlex format into out postgres tables.

In which format are the data? Text files? SQL statements?
Something binary?

 Some files convert over okay, and some come up with the error message
 'invalid byte sequence for encoding UTF8'. the files that come up
 with the error are created correctly and so are their index's, but as
 soon as we come to insert the data we get this error.

Well, so you claim, but can you prove it?
Do you use a PostgreSQL utility to import the data?
If yes, which tool? What is the exact command line?

 Does anyone know why we're getting this error message? And uis there
 a way to suppress it, or can we get around it using another format?

By format I believe that you mean encoding.
It does not matter what encoding you use as long as the data can
be represented in it, you tell PostgreSQL what the encoding is, and
the data are correct.

There is no advantage of one encoding over the other in this respect.

 Our migration utility does ask us to select the correct encoding for
 our database, and we select UTF8 but we still get the error. What do
 you guys think? Possibly the migration tools fault?

If PostgreSQL says that the data is not UTF-8, we tend to believe it.

To say more, one would need more information.
Can you identify the string about which PostgreSQL complains?
What does it look like?

 I thought we may be able to get around it using SQL_ASCII encoding -
 but it's ony 7 bit, so would we loose some data? Also our conversion
 utility doesn't have the option to use SQL_ASCII.

If you use SQL_ASCII you may succeed in getting the incorrect data into
the database, but that will not make you happy because the data will
not stop being incorrect just because they are in the database.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-11-30 Thread Usama Dar
On 11/30/07, Glyn Astill [EMAIL PROTECTED] wrote:

 Hi People,

 I've setup a postgres 8.2 server and have a database setup with UTF8
 encoding. I intend to read some of our legacy data into the table,
 this legacy data is in ASCII format, and as far as I know is 8 bit
 ASCII.

 We have a migration tool from mertechdata.com to convert these files
 that are in a DataFlex format into out postgres tables.

 Some files convert over okay, and some come up with the error message
 'invalid byte sequence for encoding UTF8'. the files that come up
 with the error are created correctly and so are their index's, but as
 soon as we come to insert the data we get this error.

 Does anyone know why we're getting this error message? And uis there
 a way to suppress it, or can we get around it using another format?

 Our migration utility does ask us to select the correct encoding for
 our database, and we select UTF8 but we still get the error. What do
 you guys think? Possibly the migration tools fault?

 I thought we may be able to get around it using SQL_ASCII encoding -
 but it's ony 7 bit, so would we loose some data? Also our conversion
 utility doesn't have the option to use SQL_ASCII.

 Are there any more flexible formats we could use? I noticed we have
 Latin 1-10 and ISO formats. Is there any reason why we shouldn't use
 these?

 Thanks
 Glyn



Latin1 is a single byte encoding,  i can't think of any reason not to try it
if the characters you have are valid ISO8859 characters. Probably posting
the hex codes of some characters which are failing will help.

  ___
 Yahoo! Answers - Got a question? Someone out there knows the answer. Try
 it
 now.
 http://uk.answers.yahoo.com/

 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




-- 
Usama Munir Dar http://linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar


[GENERAL] invalid byte sequence for encoding UTF8

2007-11-30 Thread Glyn Astill
Hi People,

I've setup a postgres 8.2 server and have a database setup with UTF8
encoding. I intend to read some of our legacy data into the table,
this legacy data is in ASCII format, and as far as I know is 8 bit
ASCII.

We have a migration tool from mertechdata.com to convert these files
that are in a DataFlex format into out postgres tables.

Some files convert over okay, and some come up with the error message
'invalid byte sequence for encoding UTF8'. the files that come up
with the error are created correctly and so are their index's, but as
soon as we come to insert the data we get this error.

Does anyone know why we're getting this error message? And uis there
a way to suppress it, or can we get around it using another format?

Our migration utility does ask us to select the correct encoding for
our database, and we select UTF8 but we still get the error. What do
you guys think? Possibly the migration tools fault?

I thought we may be able to get around it using SQL_ASCII encoding -
but it's ony 7 bit, so would we loose some data? Also our conversion
utility doesn't have the option to use SQL_ASCII.

Are there any more flexible formats we could use? I noticed we have
Latin 1-10 and ISO formats. Is there any reason why we shouldn't use
these?

Thanks
Glyn


  ___
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-11-30 Thread Martijn van Oosterhout
On Fri, Nov 30, 2007 at 09:44:36AM +, Glyn Astill wrote:
 I've setup a postgres 8.2 server and have a database setup with UTF8
 encoding. I intend to read some of our legacy data into the table,
 this legacy data is in ASCII format, and as far as I know is 8 bit
 ASCII.

Your problem is that there is no such thing as 8-bit ASCII. Determine
what encoding the data is actually in and use that.

 Our migration utility does ask us to select the correct encoding for
 our database, and we select UTF8 but we still get the error. What do
 you guys think? Possibly the migration tools fault?

I think they mean to select the correct encoding for the data, what
encoding the database is in isn't relevent. The database can convert
any encoding you want to use to UTF-8 as required.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-11-30 Thread Gregory Stark

[Generally it's not a good idea to start a new thread by responding to an
existing one, it confuses people and makes it more likely for your question to
be missed.]


Glyn Astill [EMAIL PROTECTED] writes:

 Hi People,

 I've setup a postgres 8.2 server and have a database setup with UTF8
 encoding. I intend to read some of our legacy data into the table,
 this legacy data is in ASCII format, and as far as I know is 8 bit
 ASCII.

ASCII is a 7-bit encoding. If you have bytes with the high bit set then you
have something else. Can you give any examples of characters with the high bit
set and what you think they represent?

 We have a migration tool from mertechdata.com to convert these files
 that are in a DataFlex format into out postgres tables.

 Some files convert over okay, and some come up with the error message
 'invalid byte sequence for encoding UTF8'. the files that come up
 with the error are created correctly and so are their index's, but as
 soon as we come to insert the data we get this error.

This error indicates that you are trying to import data with client_encoding
set to UTF8 but the data isn't actually UTF8 and contains invalid byte
sequences for UTF8.

If your migration toolkit lets you set the client encoding separately from the
server encoding then you can set the client encoding to match your data and
the server encoding to the encoding you want the server to use. 

Otherwise you'll have to recode the data to UTF8 or whatever encoding you want
the data to be. There are tools to do this (such as GNU recode for example).


 Are there any more flexible formats we could use? I noticed we have
 Latin 1-10 and ISO formats. Is there any reason why we shouldn't use
 these?

Well there are pros and cons. The 1-byte ISO formats will be more space
efficient and also allow some cpu optimizations so they perform somewhat
better. But if you ever need to store a character which doesn't fit in the
encoding you'll be stuck. Postgres doesn't support using multiple encodings in
the same database (or effectively even in the same initdb cluster).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] invalid byte sequence for encoding UTF8: 0xff

2007-09-03 Thread Ashish Karalkar
Hello All,

I have a data script which runs fine from PgAdmin SQL Editor,but when I  run 
this  from command prompt I get following error:


test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql

psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR:  invalid byt
e sequence for encoding UTF8: 0xff
HINT:  This error can also happen if the byte sequence does not match the encodi
ng expected by the server, which is controlled by client_encoding.


can anybody suggest me what is going wrong.
database  encoding :UTF8

PostgreSQL details:

 version
---
 PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 
20041212 (Red Hat 3.4.3-9.EL4)



Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff

2007-09-03 Thread Martijn van Oosterhout
On Mon, Sep 03, 2007 at 01:36:58PM +0530, Ashish Karalkar wrote:
 Hello All,
 
 I have a data script which runs fine from PgAdmin SQL Editor,but when I  run 
 this  from command prompt I get following error:
 test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql
 
 psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR:  invalid 
 byt
 e sequence for encoding UTF8: 0xff
 HINT:  This error can also happen if the byte sequence does not match the 
 encodi
 ng expected by the server, which is controlled by client_encoding.

Well, the error is correct, that's not a valid UTF-8 character. I seem
to remember someone saying that ooasionally windows puts BOMs in UTF-8
files (which is completely bogus). Check the file using a simple text
editor a check if there are some odd characters at the beginning of the
file.

Have a ncie day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff

2007-09-03 Thread Albe Laurenz
Ashish Karalkar wrote:
 I have a data script which runs fine from PgAdmin SQL 
 Editor,but when I  run this  from command prompt I get 
 following error:
  
  
 test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql
 
 psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: 
 ERROR:  invalid byte sequence for encoding UTF8: 0xff
 HINT:  This error can also happen if the byte sequence does 
 not match the encoding expected by the server, which is
 controlled by client_encoding.
  
 can anybody suggest me what is going wrong.
 database  encoding :UTF8
  
 PostgreSQL details:
  
  version
 --
  PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc 
 (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)

Can you tell us the following:

- What is the client operating system (where you run psql and
  PgAdmin III)?
- What is the value of the environment variable PGCLIENTENCODING
  set to on the client?
- What does the SQL command show client_encoding; return
  when you issue it in
  a) PgAdmin III
  b) psql
- Please create a file that contains only the first line
  of QSWEB_100_4_Default_Data.sql (I call it l in the following
  commands), run the following two (Linux) commands on it:
  a) od -t c l
  b) od -t x1 l
  and show us the output of both commands.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff

2007-09-03 Thread Ashish Karalkar


- Original Message - 
From: Ashish Karalkar [EMAIL PROTECTED]

To: Albe Laurenz [EMAIL PROTECTED]
Sent: Monday, September 03, 2007 4:09 PM
Subject: Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff




- Original Message - 
From: Albe Laurenz [EMAIL PROTECTED]
To: Ashish Karalkar *EXTERN* [EMAIL PROTECTED]; 
pgsql-general@postgresql.org

Sent: Monday, September 03, 2007 2:12 PM
Subject: Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff


Ashish Karalkar wrote:

I have a data script which runs fine from PgAdmin SQL
Editor,but when I  run this  from command prompt I get
following error:


test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql

psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1:
ERROR:  invalid byte sequence for encoding UTF8: 0xff
HINT:  This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by client_encoding.

can anybody suggest me what is going wrong.
database  encoding :UTF8

PostgreSQL details:

 version
--
 PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc
(GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)


Can you tell us the following:

Please find my answer below

- What is the client operating system (where you run psql and
 PgAdmin III)?

Its Windows XP - PgAdmin III
RHEL 3.4.3-9.EL4-psql (Server Machine)


- What is the value of the environment variable PGCLIENTENCODING
 set to on the client?
PGCLIENTENCODING is not set and as per documantation I think by default it 
takes value of database i.e. UTF8


- What does the SQL command show client_encoding; return
 when you issue it in
 a) PgAdmin III
UNICODE
 b) psql
UTF8

- Please create a file that contains only the first line
 of QSWEB_100_4_Default_Data.sql (I call it l in the following
 commands), run the following two (Linux) commands on it:
 a) od -t c l
 b) od -t x1 l
 and show us the output of both commands.


[EMAIL PROTECTED] qsweb]# od -t c test.sql
000   \   s   e   t   O   N   _   E   R   R   O   R   _   S   T
020   O   P
022
[EMAIL PROTECTED] qsweb]# od -t x1 test.sql
000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54
020 4f 50
022
[EMAIL PROTECTED] qsweb]#

Thanks Albe for your replay.
here is the data you wanted


Yours,
Laurenz Albe

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/ 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff

2007-09-03 Thread Albe Laurenz
Ashish Karalkar wrote:
 I have a data script which runs fine from PgAdmin SQL
 Editor,but when I  run this  from command prompt I get
 following error:

 test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql

 psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1:
 ERROR:  invalid byte sequence for encoding UTF8: 0xff

  version
 --
  PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc
 (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)

 - What is the client operating system (where you run psql and
  PgAdmin III)?

 Its Windows XP - PgAdmin III
 RHEL 3.4.3-9.EL4-psql (Server Machine)

So I get it that you run psql on Windows XP, right?

 PGCLIENTENCODING is not set and as per documantation I 
 think by default it takes value of database i.e. UTF8

 - What does the SQL command show client_encoding; return
  when you issue it in
  a) PgAdmin III
 UNICODE
  b) psql
 UTF8

Ok, I suspect that's your problem.
You created QSWEB_100_4_Default_Data.sql by using the Save dialog
in PgAdmin III on the Windows machine, right?

Then the file will probably be encoded in Windows-1252.

If your client_encoding is set to UTF8, psql will expect UTF-8
data in the SQL script and complain if it meets wrong ones.

Does the script work as expected when you change the client
encoding to WIN1252?

 - Please create a file that contains only the first line
  of QSWEB_100_4_Default_Data.sql (I call it l in the following
  commands), run the following two (Linux) commands on it:
  a) od -t c l
  b) od -t x1 l
  and show us the output of both commands.

 [EMAIL PROTECTED] qsweb]# od -t c test.sql
 000   \   s   e   t   O   N   _   E   R   R   O   R   _   S
T
 020   O   P
 022
 [EMAIL PROTECTED] qsweb]# od -t x1 test.sql
 000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54
 020 4f 50
 022

That's weird, because psql complained about line 1.

Maybe you messed something up by extracting the first line.

Try the following:

- Use binary file transfer and transfer the SQL script to a Linux
machine.

- Run od -t c -t x1 on the file

- Find the 0xff that psql complains about.

Maybe that helps to locate the problem.
0xff is an unusual Windows-1252 character as well...

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff

2007-09-03 Thread Ashish Karalkar


- Original Message - 
From: Albe Laurenz [EMAIL PROTECTED]

To: Ashish Karalkar *EXTERN* [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Sent: Monday, September 03, 2007 4:54 PM
Subject: RE: [GENERAL] invalid byte sequence for encoding UTF8: 0xff


Ashish Karalkar wrote:

I have a data script which runs fine from PgAdmin SQL
Editor,but when I  run this  from command prompt I get
following error:

test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql

psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1:
ERROR:  invalid byte sequence for encoding UTF8: 0xff

 version
--
 PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc
(GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)


- What is the client operating system (where you run psql and
 PgAdmin III)?


Its Windows XP - PgAdmin III
RHEL 3.4.3-9.EL4-psql (Server Machine)


So I get it that you run psql on Windows XP, right?

PGCLIENTENCODING is not set and as per documantation I 
think by default it takes value of database i.e. UTF8



- What does the SQL command show client_encoding; return
 when you issue it in
 a) PgAdmin III

UNICODE

 b) psql

UTF8


Ok, I suspect that's your problem.
You created QSWEB_100_4_Default_Data.sql by using the Save dialog
in PgAdmin III on the Windows machine, right?

Then the file will probably be encoded in Windows-1252.

If your client_encoding is set to UTF8, psql will expect UTF-8
data in the SQL script and complain if it meets wrong ones.

Does the script work as expected when you change the client
encoding to WIN1252?


- Please create a file that contains only the first line
 of QSWEB_100_4_Default_Data.sql (I call it l in the following
 commands), run the following two (Linux) commands on it:
 a) od -t c l
 b) od -t x1 l
 and show us the output of both commands.


[EMAIL PROTECTED] qsweb]# od -t c test.sql
000   \   s   e   t   O   N   _   E   R   R   O   R   _   S

T

020   O   P
022
[EMAIL PROTECTED] qsweb]# od -t x1 test.sql
000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54
020 4f 50
022


That's weird, because psql complained about line 1.

Maybe you messed something up by extracting the first line.

Try the following:

- Use binary file transfer and transfer the SQL script to a Linux
machine.

- Run od -t c -t x1 on the file

- Find the 0xff that psql complains about.

Maybe that helps to locate the problem.
0xff is an unusual Windows-1252 character as well...

Hey Thanks Albe it worked.


Yours,
Laurenz Albe

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] invalid byte sequence for encoding UTF8

2007-03-21 Thread Fuzzygoth
Hi,

I am trying currently trying to setup our new database sever, we have
upgraded
to PostgreSQL 8.1.8. When I try to restore the backup (which is stored
as a set
of SQL statements that my restore script feeds into PSQL to execute)
it returns
the following error.

psql:/mnt/tmp/app/application_data.sql:97425: ERROR:  invalid byte
sequence for encoding UTF8: 0xff

HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
client_encoding.

along other byte sequences eg: 0xa1, 0xac, the two remaining schemas
are
roughly 22GB and 66GB in size and is read into postgres from flat
cobol
datafiles.

our data has progressed as displayed below
PostgreSQL 7.?.? Stored in SQL-ASCII (Old configuration)
PostgreSQL 8.1.3 Stored in UTF8 (current conguration)
PostgreSQL 8.1.8 Stored in UTF8 (our future configuration)

The encoding type set on the server was changed to UTF8 from SQL-ASCII
after
we moved to version 8.1.3 for purposes of globalisation.

I've searched the forums and found people with similar problems but
not much
on a way to remedy it. I did try using iconv which was suggested in a
thread
but it returned an error saying even the 22GB file was too large to
work on.

any help would be gratfully appreciated.

Many Thanks
David P


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-03-21 Thread Alan Hodgson
On Wednesday 21 March 2007 04:17, Fuzzygoth [EMAIL PROTECTED] 
wrote:
 I've searched the forums and found people with similar problems but
 not much
 on a way to remedy it. I did try using iconv which was suggested in a
 thread
 but it returned an error saying even the 22GB file was too large to
 work on.

iconv needs to read the whole file into RAM.  What you can do is use the 
UNIX split utility to split the dump file into smaller segments, use iconv 
on each segment, and then cat all the converted segments back together into 
a new dump file.  iconv is I think your best option for converting the dump 
to a valid encoding.

-- 
None are more hopelessly enslaved than those who falsely believe they are
free. -- Johann W. Von Goethe


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-03-21 Thread Martijn van Oosterhout
On Wed, Mar 21, 2007 at 09:54:41AM -0700, Alan Hodgson wrote:
 iconv needs to read the whole file into RAM.  What you can do is use the 
 UNIX split utility to split the dump file into smaller segments, use iconv 
 on each segment, and then cat all the converted segments back together into 
 a new dump file.  iconv is I think your best option for converting the dump 
 to a valid encoding.

The guys at openstreetmap have written a UTF-8 cleaner that doesn't
read the whole file into memory:

http://trac.openstreetmap.org/browser/utils/planet.osm/C

Definitly more convenient for large files.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[GENERAL] invalid byte sequence for encoding UTF8

2007-01-16 Thread Gary Benade
I used shp2pgsql.exe to create an import sql for my gis database.
The resultant sql has data like this in it.INSERT INTO gis.sa_area 
(label,type,level,the_geom) VALUES 
('MÔRELIG','0x2','2','01060001000');
The Ô is ascii char 212.
This wont import, PSQL returns
ERROR: invalid byte sequence for encoding UTF8: 0xd452
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by client-encoding

TIA
Gary


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-01-16 Thread Martijn van Oosterhout
On Tue, Jan 16, 2007 at 03:40:52PM +0200, Gary Benade wrote:
 I used shp2pgsql.exe to create an import sql for my gis database.
 The resultant sql has data like this in it.INSERT INTO gis.sa_area 
 (label,type,level,the_geom) VALUES 
 ('MÔRELIG','0x2','2','01060001000');
 The Ô is ascii char 212.
 This wont import, PSQL returns
 ERROR: invalid byte sequence for encoding UTF8: 0xd452
 HINT: This error can also happen if the byte sequence does not match the 
 encoding expected by the server, which is controlled by client-encoding

Well, your data isn't UTF8 and yet that's what you told the server.
Either make the data UTF8, or tell the server the actual encoding
used...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] invalid byte sequence for encoding UTF8

2007-01-16 Thread Chad Wagner

On 1/16/07, Gary Benade [EMAIL PROTECTED] wrote:


I used shp2pgsql.exe to create an import sql for my gis database.
The resultant sql has data like this in it.INSERT INTO gis.sa_area
(label,type,level,the_geom) VALUES
('MÔRELIG','0x2','2','01060001000');
The Ô is ascii char 212.
This wont import, PSQL returns
ERROR: invalid byte sequence for encoding UTF8: 0xd452
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by client-encoding




I am not terribly familiar with PostGIS (other than installing it, running
the test cases and saying cool :), but it appears that your source data is
probably ISO-8859-1.  You should probably use the -W switch with shp2pgsql
and specify the client encoding as LATIN1, it should write a dump file
with SET client_encoding to 'LATIN1' instead of UTF8 (or you can manually
tweak the SQL file).


--
Chad
http://www.postgresqlforums.com/