Re: [GENERAL] invalid byte sequence for encoding UTF8: 0x00
On Sat, Nov 19, 2011 at 09:32:12AM -0800, pawel_kukawski wrote: Is there any way I can store NULL character (\u) in string ? Or there is only one option that I have change every text field to bytea. correct question is: why do you want to store \u in text field? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] invalid byte sequence for encoding UTF8: 0x00
Hi, Is there any way I can store NULL character (\u) in string ? Or there is only one option that I have change every text field to bytea. Regards, Paweł -- View this message in context: http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp5007173p5007173.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf
BRUSSER Michael wrote: Is there a way to find the records with the text field containing Unicode bytes 0xedbebf? Unfortunately this is a very old version 7.3.10 This should work on 7.3 (according to the documentation): SELECT id FROM nlsdata WHERE position('\360\235\204\236'::bytea IN val::bytea) = 1; Albe, thanks for pointing this out! I made a minor change, added decode since text cannot be cast to bytea and tried something like this: SELECT id FROM myTable WHERE position('\360\235\204\236'::bytea IN decode(myTextField, 'escape')) !=0 ERROR: decode: Bad input string for type bytea Hrm. I didn't know that there was no cast from text to bytea in 7.3. Maybe this explains why? testdb=# select decode('\360\235\204\236'::text, 'escape'); ERROR: Unicode = 0x1 is not supported No, that is an error on my side. I gave you the wrong byte sequence. For 0xedbebf you should actually write '\355\276\277'. But that's no valid UTF-8 sequence. but I'm not ready to give up yet... If you know the byte sequence that causes trouble, you could also use something like sed to search and replace it in the dump file. Or (if there are not too many) you could search for the pattern and identify the rows in the database. Then you know which database rows to update. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf
BRUSSER Michael wrote: Is there a way to find the records with the text field containing Unicode bytes 0xedbebf? Unfortunately this is a very old version 7.3.10 This should work on 7.3 (according to the documentation): SELECT id FROM nlsdata WHERE position('\360\235\204\236'::bytea IN val::bytea) = 1; Here val is the column containing the values in question. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf
-Original Message- From: Albe Laurenz [mailto:laurenz.a...@wien.gv.at] Sent: Thursday, June 16, 2011 5:16 AM To: BRUSSER Michael; pgsql-general@postgresql.org Subject: RE: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf BRUSSER Michael wrote: Is there a way to find the records with the text field containing Unicode bytes 0xedbebf? Unfortunately this is a very old version 7.3.10 This should work on 7.3 (according to the documentation): SELECT id FROM nlsdata WHERE position('\360\235\204\236'::bytea IN val::bytea) = 1; Albe, thanks for pointing this out! I made a minor change, added decode since text cannot be cast to bytea and tried something like this: SELECT id FROM myTable WHERE position('\360\235\204\236'::bytea IN decode(myTextField, 'escape')) != 0 ERROR: decode: Bad input string for type bytea If I limit query to some healthy records - AND id between 100 and 110 it works and returns empty result. So the problem now is that without decode myTextField cannot be converted to bytea, with decode it breaks on the first 'bad' value. Maybe this explains why? testdb=# select decode('\360\235\204\236'::text, 'escape'); ERROR: Unicode = 0x1 is not supported Another thought is that if I get this to work I may need to search for anything outside of the standard utf range, rather than any specific sequence. I am beginning to understand why many people dealt with this in the dump file, but I'm not ready to give up yet... As usual, any ideas are appreciated! Thanks. This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf
This is a follow-up on my previous message http://archives.postgresql.org/pgsql-general/2011-06/msg00054.php I think I have now some understanding of what's causing the problem, but I don't have a good solution, instead more questions. The release notes for v8.1 at http://www.postgresql.org/docs/current/interactive/release-8-1.html make a good suggestion on using iconv to convert the plain-text dump file into utf8 On Linux this did not work, the input and output file were identical. The iconv on Solaris refused to open the input file (probably too big), although it worked with a chunk of it and reported conversion error. Unless there's no other options I don't want to use sed or break file into pieces, if possible, I would prefer to identify the bad records on the database. I tried SELECT with everything I could think of: ~*, SIMILAR TO, and the likes of them, but I never got it right. Is there a way to find the records with the text field containing Unicode bytes 0xedbebf? Unfortunately this is a very old version 7.3.10 Thank you. Michael. This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer
Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf
On June 15, 2011 01:18:27 PM BRUSSER Michael wrote: Unless there's no other options I don't want to use sed or break file into pieces, if possible, iconv loads everything into RAM. You can use split, convert the pieces, and then recombine, I did that when converting a large database to utf-8 and it worked. -- Obama has now fired more cruise missiles than all other Nobel Peace prize winners combined. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf
-Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Alan Hodgson Sent: Wednesday, June 15, 2011 5:37 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Invalid byte sequence for encoding UTF8: 0xedbebf On June 15, 2011 01:18:27 PM BRUSSER Michael wrote: Unless there's no other options I don't want to use sed or break file into pieces, if possible, -- iconv loads everything into RAM. You can use split, convert the pieces, and then recombine, I did that when converting a large database to utf-8 and it worked. - - Thanks, but this is exactly what I am trying to avoid! Using split is good if you have one database to upgrade and no external customers. (Not to mention other problems with this approach) This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] invalid byte sequence for encoding UTF8
We upgrading some old database (7.3.10 to 8.4.4). This involves running pg_dump on the old db and loading the datafile to the new db. If this matters we do not use pg_restore, the dump file is just sourced with psql, and this is where I ran into problem: psql: .../postgresql_archive.src/... ERROR: invalid byte sequence for encoding UTF8: 0xedbebf HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. The server and client encoding are both Unicode. I think we may have some copy/paste MS-Word markup and possibly other odd things on the old database. All this junk is found on the 'text' fields. I found a number of related postings, but did not see a good solution. Some folks suggested cleaning the datafile prior to loading, while someone else did essentially the same thing on the database before dumping it. I am looking for advice, hopefully the best technique if there is one, any suggestion is appreciated. Thanks, Michael. This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer
Re: [GENERAL] invalid byte sequence for encoding UTF8
That specific character sequence is a result of Unicode implementations prior to 6.0 mixing with later implementations. See here: http://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character You could replace that sequence with the correct 0xFFFD sequence with `sed` for example (if using a plaintext dump format). On Thu, Jun 2, 2011 at 4:17 PM, BRUSSER Michael michael.brus...@3ds.comwrote: We upgrading some old database (7.3.10 to 8.4.4). This involves running pg_dump on the old db and loading the datafile to the new db. If this matters we do not use pg_restore, the dump file is just sourced with psql, and this is where I ran into problem: psql: .../postgresql_archive.src/... ERROR: invalid byte sequence for encoding UTF8: 0xedbebf HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. The server and client encoding are both Unicode. I think we may have some copy/paste MS-Word markup and possibly other odd things on the old database. All this junk is found on the ‘text’ fields. I found a number of related postings, but did not see a good solution. Some folks suggested cleaning the datafile prior to loading, while someone else did essentially the same thing on the database before dumping it. I am looking for advice, hopefully the “best technique” if there is one, any suggestion is appreciated. Thanks, Michael. This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xf1612220
2011/5/12 Craig Ringer cr...@postnewspapers.com.au: On 05/11/2011 03:16 PM, AI Rumman wrote: I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3 and getting the following error: pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE DATA originaldata postgres pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding UTF8: 0xf1612220 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. CONTEXT: COPY wi_originaldata, line 3592 I took a dump from 8.2 server and then tried to restore at 8.3. Both the client_encoding and server_encoding are UTF8 at both the servers. Newer versions of Pg got better at caching bad unicode. While this helps prevent bad data getting into the database, it's a right pain if you're moving data over from an older version with less strict checks. I don't know of any way to relax the checks for the purpose of importing dumps. You'll need to fix your dump files before loading them (by finding the faulty text and fixing it) or fix it in the origin database before migrating the data. Neither approach is nice or easy, but nobody has yet stepped up to write a unicode verifier tool that checks old databases' text fields against stricter rules... The 2 following articles have SQL functions and documentation you may find useful: http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_1.html http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_2.html -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] invalid byte sequence for encoding UTF8: 0xf1612220
I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3 and getting the following error: pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE DATA originaldata postgres pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding UTF8: 0xf1612220 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. CONTEXT: COPY wi_originaldata, line 3592 I took a dump from 8.2 server and then tried to restore at 8.3. Both the client_encoding and server_encoding are UTF8 at both the servers. Table public.data Column| Type | Modifiers -++ orgid | integer| id | integer| not null default nextval(('data'::text)::regclass) datatypecode| character varying(15) | batchname | character varying(60) | filename| character varying(60) | encoding| character varying(20) | errormessage| character varying(255) | originaldata_backup | bytea | processeddata | bytea | validatedflag | smallint | processedflag | smallint | createddate | date | createdtime | time without time zone | modifieddate| date | modifiedtime| time without time zone | processeddate | date | processedtime | time without time zone | deletedflag | smallint | originaldata| text | Indexes: data_pkey PRIMARY KEY, btree (id) Any help will be appreciable.
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xf1612220
On 05/11/2011 03:16 PM, AI Rumman wrote: I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3 and getting the following error: pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE DATA originaldata postgres pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding UTF8: 0xf1612220 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. CONTEXT: COPY wi_originaldata, line 3592 I took a dump from 8.2 server and then tried to restore at 8.3. Both the client_encoding and server_encoding are UTF8 at both the servers. Newer versions of Pg got better at caching bad unicode. While this helps prevent bad data getting into the database, it's a right pain if you're moving data over from an older version with less strict checks. I don't know of any way to relax the checks for the purpose of importing dumps. You'll need to fix your dump files before loading them (by finding the faulty text and fixing it) or fix it in the origin database before migrating the data. Neither approach is nice or easy, but nobody has yet stepped up to write a unicode verifier tool that checks old databases' text fields against stricter rules... -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab
Mark D. Grand wrote: I am having a vexing problem with a script I am writing to populate reference tables in a new database. I am running postgreSQL 8.3 with psql 8.3.7. Psql reads this SQL statement: INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION) VALUES ('Super-User Authorization', 'This allows a super-user to administer all meta-data.', 'UserID «Administer» ()'); and I get this message: ERROR: invalid byte sequence for encoding UTF8: 0xab HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. It is complaining about the '«' character. I do not understand why. The database is created the commands CREATE DATABASE mayyou WITH OWNER=meta_auth ENCODING='UTF8'; ALTER DATABASE mayyou SET client_encoding = 'UTF8'; When I give psql the \encoding command, it replies UTF8 Why is it complaining about this valid character code? The database stores characters in UTF-8, and the client expects UTF-8 characters, but presumably the characters you feed into psql are not UTF-8. If this is some kind of UNIX, it might be instructive to type 'echo « | od -t x1' on the command line. Also knowing the current locale might help to determine the problem. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab
It turns out that my problem was that the editor I was using (emacs) does not properly support utf8 encoding. -Original Message- From: Albe Laurenz [mailto:laurenz.a...@wien.gv.at] Sent: Monday, June 08, 2009 5:59 AM To: Grand, Mark D.; pgsql-general@postgresql.org Subject: RE: [GENERAL] invalid byte sequence for encoding UTF8: 0xab Mark D. Grand wrote: I am having a vexing problem with a script I am writing to populate reference tables in a new database. I am running postgreSQL 8.3 with psql 8.3.7. Psql reads this SQL statement: INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION) VALUES ('Super-User Authorization', 'This allows a super-user to administer all meta-data.', 'UserID Administer ()'); and I get this message: ERROR: invalid byte sequence for encoding UTF8: 0xab HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. It is complaining about the '' character. I do not understand why. The database is created the commands CREATE DATABASE mayyou WITH OWNER=meta_auth ENCODING='UTF8'; ALTER DATABASE mayyou SET client_encoding = 'UTF8'; When I give psql the \encoding command, it replies UTF8 Why is it complaining about this valid character code? The database stores characters in UTF-8, and the client expects UTF-8 characters, but presumably the characters you feed into psql are not UTF-8. If this is some kind of UNIX, it might be instructive to type 'echo | od -t x1' on the command line. Also knowing the current locale might help to determine the problem. Yours, Laurenz Albe This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited. If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab
Grand, Mark D. mgr...@emory.edu writes: It turns out that my problem was that the editor I was using (emacs) does not properly support utf8 encoding. Emacs does support utf8 properly. http://www.emacswiki.org/emacs/ChangingEncodings It could be I'm biased because I use emacs from CVS, which is going to be emacs23, and is as stable as emacs has always been for me. http://emacs.orebokech.com/ http://atomized.org/wp-content/cocoa-emacs-nightly/ From within emacs, to get a ton of information about char under point, try C-x = (one line version) or M-x describe-char (full version): Char: (60, #o74, #x3c) point=1312 of 4162 (31%) 301-4163 column=66 character: (60, #o74, #x3c) preferred charset: ascii (ASCII (ISO646 IRV)) code point: 0x3C syntax: .which means: punctuation category: .:Base, a:ASCII, l:Latin, r:Roman buffer code: #x3C file code: #x3C (encoded by coding system utf-8-emacs) display: by this font (glyph code) xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-16-*-*-*-m-0-iso10646-1 (#x1F) But I guess we're off topic now. HTH, regards, -- dim -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] invalid byte sequence for encoding UTF8: 0xab
I am having a vexing problem with a script I am writing to populate reference tables in a new database. I am running postgreSQL 8.3 with psql 8.3.7. Psql reads this SQL statement: INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION) VALUES ('Super-User Authorization', 'This allows a super-user to administer all meta-data.', 'UserID Administer ()'); and I get this message: ERROR: invalid byte sequence for encoding UTF8: 0xab HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. It is complaining about the '' character. I do not understand why. The database is created the commands CREATE DATABASE mayyou WITH OWNER=meta_auth ENCODING='UTF8'; ALTER DATABASE mayyou SET client_encoding = 'UTF8'; When I give psql the \encoding command, it replies UTF8 Why is it complaining about this valid character code? This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited. If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments).
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab
Grand, Mark D. mgr...@emory.edu writes: ... I get this message: ERROR: invalid byte sequence for encoding UTF8: 0xab HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. It is complaining about the '' character. I do not understand why. The ASCII code for '' is 0x3c, not 0xab. I am not sure what you are actually typing; although it's suggestive that the LATIN1 code 0xab corresponds to a symbol that looks approximately like ''. The most likely bet is that you are typing the wrong thing and using a terminal emulator that is not set to generate UTF8-encoded characters. You should try to make sure that client_encoding is set to match what your keyboard actually generates. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xab
On Fri, Jun 5, 2009 at 9:57 AM, Tom Lanet...@sss.pgh.pa.us wrote: The ASCII code for '' is 0x3c, not 0xab. I am not sure what you are actually typing; although it's suggestive that the LATIN1 code 0xab corresponds to a symbol that looks approximately like ''. The most likely bet is that you are typing the wrong thing and using a terminal Must be something with your mail program, because in the version I am reading postgres is complaining about the approximately like '' symbol. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] invalid byte sequence for encoding UTF8
Glyn Astill wrote: I've setup a postgres 8.2 server and have a database setup with UTF8 encoding. I intend to read some of our legacy data into the table, this legacy data is in ASCII format, and as far as I know is 8 bit ASCII. We have a migration tool from mertechdata.com to convert these files that are in a DataFlex format into out postgres tables. In which format are the data? Text files? SQL statements? Something binary? Some files convert over okay, and some come up with the error message 'invalid byte sequence for encoding UTF8'. the files that come up with the error are created correctly and so are their index's, but as soon as we come to insert the data we get this error. Well, so you claim, but can you prove it? Do you use a PostgreSQL utility to import the data? If yes, which tool? What is the exact command line? Does anyone know why we're getting this error message? And uis there a way to suppress it, or can we get around it using another format? By format I believe that you mean encoding. It does not matter what encoding you use as long as the data can be represented in it, you tell PostgreSQL what the encoding is, and the data are correct. There is no advantage of one encoding over the other in this respect. Our migration utility does ask us to select the correct encoding for our database, and we select UTF8 but we still get the error. What do you guys think? Possibly the migration tools fault? If PostgreSQL says that the data is not UTF-8, we tend to believe it. To say more, one would need more information. Can you identify the string about which PostgreSQL complains? What does it look like? I thought we may be able to get around it using SQL_ASCII encoding - but it's ony 7 bit, so would we loose some data? Also our conversion utility doesn't have the option to use SQL_ASCII. If you use SQL_ASCII you may succeed in getting the incorrect data into the database, but that will not make you happy because the data will not stop being incorrect just because they are in the database. Yours, Laurenz Albe ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid byte sequence for encoding UTF8
On 11/30/07, Glyn Astill [EMAIL PROTECTED] wrote: Hi People, I've setup a postgres 8.2 server and have a database setup with UTF8 encoding. I intend to read some of our legacy data into the table, this legacy data is in ASCII format, and as far as I know is 8 bit ASCII. We have a migration tool from mertechdata.com to convert these files that are in a DataFlex format into out postgres tables. Some files convert over okay, and some come up with the error message 'invalid byte sequence for encoding UTF8'. the files that come up with the error are created correctly and so are their index's, but as soon as we come to insert the data we get this error. Does anyone know why we're getting this error message? And uis there a way to suppress it, or can we get around it using another format? Our migration utility does ask us to select the correct encoding for our database, and we select UTF8 but we still get the error. What do you guys think? Possibly the migration tools fault? I thought we may be able to get around it using SQL_ASCII encoding - but it's ony 7 bit, so would we loose some data? Also our conversion utility doesn't have the option to use SQL_ASCII. Are there any more flexible formats we could use? I noticed we have Latin 1-10 and ISO formats. Is there any reason why we shouldn't use these? Thanks Glyn Latin1 is a single byte encoding, i can't think of any reason not to try it if the characters you have are valid ISO8859 characters. Probably posting the hex codes of some characters which are failing will help. ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Usama Munir Dar http://linkedin.com/in/usamadar Consultant Architect Cell:+92 321 5020666 Skype: usamadar
[GENERAL] invalid byte sequence for encoding UTF8
Hi People, I've setup a postgres 8.2 server and have a database setup with UTF8 encoding. I intend to read some of our legacy data into the table, this legacy data is in ASCII format, and as far as I know is 8 bit ASCII. We have a migration tool from mertechdata.com to convert these files that are in a DataFlex format into out postgres tables. Some files convert over okay, and some come up with the error message 'invalid byte sequence for encoding UTF8'. the files that come up with the error are created correctly and so are their index's, but as soon as we come to insert the data we get this error. Does anyone know why we're getting this error message? And uis there a way to suppress it, or can we get around it using another format? Our migration utility does ask us to select the correct encoding for our database, and we select UTF8 but we still get the error. What do you guys think? Possibly the migration tools fault? I thought we may be able to get around it using SQL_ASCII encoding - but it's ony 7 bit, so would we loose some data? Also our conversion utility doesn't have the option to use SQL_ASCII. Are there any more flexible formats we could use? I noticed we have Latin 1-10 and ISO formats. Is there any reason why we shouldn't use these? Thanks Glyn ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] invalid byte sequence for encoding UTF8
On Fri, Nov 30, 2007 at 09:44:36AM +, Glyn Astill wrote: I've setup a postgres 8.2 server and have a database setup with UTF8 encoding. I intend to read some of our legacy data into the table, this legacy data is in ASCII format, and as far as I know is 8 bit ASCII. Your problem is that there is no such thing as 8-bit ASCII. Determine what encoding the data is actually in and use that. Our migration utility does ask us to select the correct encoding for our database, and we select UTF8 but we still get the error. What do you guys think? Possibly the migration tools fault? I think they mean to select the correct encoding for the data, what encoding the database is in isn't relevent. The database can convert any encoding you want to use to UTF-8 as required. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Those who make peaceful revolution impossible will make violent revolution inevitable. -- John F Kennedy signature.asc Description: Digital signature
Re: [GENERAL] invalid byte sequence for encoding UTF8
[Generally it's not a good idea to start a new thread by responding to an existing one, it confuses people and makes it more likely for your question to be missed.] Glyn Astill [EMAIL PROTECTED] writes: Hi People, I've setup a postgres 8.2 server and have a database setup with UTF8 encoding. I intend to read some of our legacy data into the table, this legacy data is in ASCII format, and as far as I know is 8 bit ASCII. ASCII is a 7-bit encoding. If you have bytes with the high bit set then you have something else. Can you give any examples of characters with the high bit set and what you think they represent? We have a migration tool from mertechdata.com to convert these files that are in a DataFlex format into out postgres tables. Some files convert over okay, and some come up with the error message 'invalid byte sequence for encoding UTF8'. the files that come up with the error are created correctly and so are their index's, but as soon as we come to insert the data we get this error. This error indicates that you are trying to import data with client_encoding set to UTF8 but the data isn't actually UTF8 and contains invalid byte sequences for UTF8. If your migration toolkit lets you set the client encoding separately from the server encoding then you can set the client encoding to match your data and the server encoding to the encoding you want the server to use. Otherwise you'll have to recode the data to UTF8 or whatever encoding you want the data to be. There are tools to do this (such as GNU recode for example). Are there any more flexible formats we could use? I noticed we have Latin 1-10 and ISO formats. Is there any reason why we shouldn't use these? Well there are pros and cons. The 1-byte ISO formats will be more space efficient and also allow some cpu optimizations so they perform somewhat better. But if you ever need to store a character which doesn't fit in the encoding you'll be stuck. Postgres doesn't support using multiple encodings in the same database (or effectively even in the same initdb cluster). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] invalid byte sequence for encoding UTF8: 0xff
Hello All, I have a data script which runs fine from PgAdmin SQL Editor,but when I run this from command prompt I get following error: test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR: invalid byt e sequence for encoding UTF8: 0xff HINT: This error can also happen if the byte sequence does not match the encodi ng expected by the server, which is controlled by client_encoding. can anybody suggest me what is going wrong. database encoding :UTF8 PostgreSQL details: version --- PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff
On Mon, Sep 03, 2007 at 01:36:58PM +0530, Ashish Karalkar wrote: Hello All, I have a data script which runs fine from PgAdmin SQL Editor,but when I run this from command prompt I get following error: test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR: invalid byt e sequence for encoding UTF8: 0xff HINT: This error can also happen if the byte sequence does not match the encodi ng expected by the server, which is controlled by client_encoding. Well, the error is correct, that's not a valid UTF-8 character. I seem to remember someone saying that ooasionally windows puts BOMs in UTF-8 files (which is completely bogus). Check the file using a simple text editor a check if there are some odd characters at the beginning of the file. Have a ncie day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff
Ashish Karalkar wrote: I have a data script which runs fine from PgAdmin SQL Editor,but when I run this from command prompt I get following error: test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR: invalid byte sequence for encoding UTF8: 0xff HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. can anybody suggest me what is going wrong. database encoding :UTF8 PostgreSQL details: version -- PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4) Can you tell us the following: - What is the client operating system (where you run psql and PgAdmin III)? - What is the value of the environment variable PGCLIENTENCODING set to on the client? - What does the SQL command show client_encoding; return when you issue it in a) PgAdmin III b) psql - Please create a file that contains only the first line of QSWEB_100_4_Default_Data.sql (I call it l in the following commands), run the following two (Linux) commands on it: a) od -t c l b) od -t x1 l and show us the output of both commands. Yours, Laurenz Albe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff
- Original Message - From: Ashish Karalkar [EMAIL PROTECTED] To: Albe Laurenz [EMAIL PROTECTED] Sent: Monday, September 03, 2007 4:09 PM Subject: Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff - Original Message - From: Albe Laurenz [EMAIL PROTECTED] To: Ashish Karalkar *EXTERN* [EMAIL PROTECTED]; pgsql-general@postgresql.org Sent: Monday, September 03, 2007 2:12 PM Subject: Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff Ashish Karalkar wrote: I have a data script which runs fine from PgAdmin SQL Editor,but when I run this from command prompt I get following error: test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR: invalid byte sequence for encoding UTF8: 0xff HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. can anybody suggest me what is going wrong. database encoding :UTF8 PostgreSQL details: version -- PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4) Can you tell us the following: Please find my answer below - What is the client operating system (where you run psql and PgAdmin III)? Its Windows XP - PgAdmin III RHEL 3.4.3-9.EL4-psql (Server Machine) - What is the value of the environment variable PGCLIENTENCODING set to on the client? PGCLIENTENCODING is not set and as per documantation I think by default it takes value of database i.e. UTF8 - What does the SQL command show client_encoding; return when you issue it in a) PgAdmin III UNICODE b) psql UTF8 - Please create a file that contains only the first line of QSWEB_100_4_Default_Data.sql (I call it l in the following commands), run the following two (Linux) commands on it: a) od -t c l b) od -t x1 l and show us the output of both commands. [EMAIL PROTECTED] qsweb]# od -t c test.sql 000 \ s e t O N _ E R R O R _ S T 020 O P 022 [EMAIL PROTECTED] qsweb]# od -t x1 test.sql 000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54 020 4f 50 022 [EMAIL PROTECTED] qsweb]# Thanks Albe for your replay. here is the data you wanted Yours, Laurenz Albe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff
Ashish Karalkar wrote: I have a data script which runs fine from PgAdmin SQL Editor,but when I run this from command prompt I get following error: test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR: invalid byte sequence for encoding UTF8: 0xff version -- PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4) - What is the client operating system (where you run psql and PgAdmin III)? Its Windows XP - PgAdmin III RHEL 3.4.3-9.EL4-psql (Server Machine) So I get it that you run psql on Windows XP, right? PGCLIENTENCODING is not set and as per documantation I think by default it takes value of database i.e. UTF8 - What does the SQL command show client_encoding; return when you issue it in a) PgAdmin III UNICODE b) psql UTF8 Ok, I suspect that's your problem. You created QSWEB_100_4_Default_Data.sql by using the Save dialog in PgAdmin III on the Windows machine, right? Then the file will probably be encoded in Windows-1252. If your client_encoding is set to UTF8, psql will expect UTF-8 data in the SQL script and complain if it meets wrong ones. Does the script work as expected when you change the client encoding to WIN1252? - Please create a file that contains only the first line of QSWEB_100_4_Default_Data.sql (I call it l in the following commands), run the following two (Linux) commands on it: a) od -t c l b) od -t x1 l and show us the output of both commands. [EMAIL PROTECTED] qsweb]# od -t c test.sql 000 \ s e t O N _ E R R O R _ S T 020 O P 022 [EMAIL PROTECTED] qsweb]# od -t x1 test.sql 000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54 020 4f 50 022 That's weird, because psql complained about line 1. Maybe you messed something up by extracting the first line. Try the following: - Use binary file transfer and transfer the SQL script to a Linux machine. - Run od -t c -t x1 on the file - Find the 0xff that psql complains about. Maybe that helps to locate the problem. 0xff is an unusual Windows-1252 character as well... Yours, Laurenz Albe ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xff
- Original Message - From: Albe Laurenz [EMAIL PROTECTED] To: Ashish Karalkar *EXTERN* [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Sent: Monday, September 03, 2007 4:54 PM Subject: RE: [GENERAL] invalid byte sequence for encoding UTF8: 0xff Ashish Karalkar wrote: I have a data script which runs fine from PgAdmin SQL Editor,but when I run this from command prompt I get following error: test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR: invalid byte sequence for encoding UTF8: 0xff version -- PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4) - What is the client operating system (where you run psql and PgAdmin III)? Its Windows XP - PgAdmin III RHEL 3.4.3-9.EL4-psql (Server Machine) So I get it that you run psql on Windows XP, right? PGCLIENTENCODING is not set and as per documantation I think by default it takes value of database i.e. UTF8 - What does the SQL command show client_encoding; return when you issue it in a) PgAdmin III UNICODE b) psql UTF8 Ok, I suspect that's your problem. You created QSWEB_100_4_Default_Data.sql by using the Save dialog in PgAdmin III on the Windows machine, right? Then the file will probably be encoded in Windows-1252. If your client_encoding is set to UTF8, psql will expect UTF-8 data in the SQL script and complain if it meets wrong ones. Does the script work as expected when you change the client encoding to WIN1252? - Please create a file that contains only the first line of QSWEB_100_4_Default_Data.sql (I call it l in the following commands), run the following two (Linux) commands on it: a) od -t c l b) od -t x1 l and show us the output of both commands. [EMAIL PROTECTED] qsweb]# od -t c test.sql 000 \ s e t O N _ E R R O R _ S T 020 O P 022 [EMAIL PROTECTED] qsweb]# od -t x1 test.sql 000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54 020 4f 50 022 That's weird, because psql complained about line 1. Maybe you messed something up by extracting the first line. Try the following: - Use binary file transfer and transfer the SQL script to a Linux machine. - Run od -t c -t x1 on the file - Find the 0xff that psql complains about. Maybe that helps to locate the problem. 0xff is an unusual Windows-1252 character as well... Hey Thanks Albe it worked. Yours, Laurenz Albe ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] invalid byte sequence for encoding UTF8
Hi, I am trying currently trying to setup our new database sever, we have upgraded to PostgreSQL 8.1.8. When I try to restore the backup (which is stored as a set of SQL statements that my restore script feeds into PSQL to execute) it returns the following error. psql:/mnt/tmp/app/application_data.sql:97425: ERROR: invalid byte sequence for encoding UTF8: 0xff HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. along other byte sequences eg: 0xa1, 0xac, the two remaining schemas are roughly 22GB and 66GB in size and is read into postgres from flat cobol datafiles. our data has progressed as displayed below PostgreSQL 7.?.? Stored in SQL-ASCII (Old configuration) PostgreSQL 8.1.3 Stored in UTF8 (current conguration) PostgreSQL 8.1.8 Stored in UTF8 (our future configuration) The encoding type set on the server was changed to UTF8 from SQL-ASCII after we moved to version 8.1.3 for purposes of globalisation. I've searched the forums and found people with similar problems but not much on a way to remedy it. I did try using iconv which was suggested in a thread but it returned an error saying even the 22GB file was too large to work on. any help would be gratfully appreciated. Many Thanks David P ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] invalid byte sequence for encoding UTF8
On Wednesday 21 March 2007 04:17, Fuzzygoth [EMAIL PROTECTED] wrote: I've searched the forums and found people with similar problems but not much on a way to remedy it. I did try using iconv which was suggested in a thread but it returned an error saying even the 22GB file was too large to work on. iconv needs to read the whole file into RAM. What you can do is use the UNIX split utility to split the dump file into smaller segments, use iconv on each segment, and then cat all the converted segments back together into a new dump file. iconv is I think your best option for converting the dump to a valid encoding. -- None are more hopelessly enslaved than those who falsely believe they are free. -- Johann W. Von Goethe ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid byte sequence for encoding UTF8
On Wed, Mar 21, 2007 at 09:54:41AM -0700, Alan Hodgson wrote: iconv needs to read the whole file into RAM. What you can do is use the UNIX split utility to split the dump file into smaller segments, use iconv on each segment, and then cat all the converted segments back together into a new dump file. iconv is I think your best option for converting the dump to a valid encoding. The guys at openstreetmap have written a UTF-8 cleaner that doesn't read the whole file into memory: http://trac.openstreetmap.org/browser/utils/planet.osm/C Definitly more convenient for large files. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
[GENERAL] invalid byte sequence for encoding UTF8
I used shp2pgsql.exe to create an import sql for my gis database. The resultant sql has data like this in it.INSERT INTO gis.sa_area (label,type,level,the_geom) VALUES ('MÔRELIG','0x2','2','01060001000'); The Ô is ascii char 212. This wont import, PSQL returns ERROR: invalid byte sequence for encoding UTF8: 0xd452 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client-encoding TIA Gary
Re: [GENERAL] invalid byte sequence for encoding UTF8
On Tue, Jan 16, 2007 at 03:40:52PM +0200, Gary Benade wrote: I used shp2pgsql.exe to create an import sql for my gis database. The resultant sql has data like this in it.INSERT INTO gis.sa_area (label,type,level,the_geom) VALUES ('MÔRELIG','0x2','2','01060001000'); The Ô is ascii char 212. This wont import, PSQL returns ERROR: invalid byte sequence for encoding UTF8: 0xd452 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client-encoding Well, your data isn't UTF8 and yet that's what you told the server. Either make the data UTF8, or tell the server the actual encoding used... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] invalid byte sequence for encoding UTF8
On 1/16/07, Gary Benade [EMAIL PROTECTED] wrote: I used shp2pgsql.exe to create an import sql for my gis database. The resultant sql has data like this in it.INSERT INTO gis.sa_area (label,type,level,the_geom) VALUES ('MÔRELIG','0x2','2','01060001000'); The Ô is ascii char 212. This wont import, PSQL returns ERROR: invalid byte sequence for encoding UTF8: 0xd452 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client-encoding I am not terribly familiar with PostGIS (other than installing it, running the test cases and saying cool :), but it appears that your source data is probably ISO-8859-1. You should probably use the -W switch with shp2pgsql and specify the client encoding as LATIN1, it should write a dump file with SET client_encoding to 'LATIN1' instead of UTF8 (or you can manually tweak the SQL file). -- Chad http://www.postgresqlforums.com/