Sim, are you sure that encoding of the dump and the database in which you are trying to restore it are same ?
2010/11/23 Sim Zacks <s...@compulab.co.il> > Thanks for your help. > > I think a trigger will actually be the easiest. The way i can tell if there > is invalid data is simply to do an Upper(text) and if it has invalid data it > fails. > > I dumped the fixed database. Now I have a years worth of backups that I > can't restore a specific table from. It most probably will never mean > anything. Every once in a while, I get asked to check what was in the db > against what is in there, but this table will probably never be audited. > > > Sim > > > > On 11/23/2010 10:33 AM, Dmitriy Igrishin wrote: > > Hey Sim, > > 2010/11/23 Sim Zacks <s...@compulab.co.il> > >> On 11/21/2010 05:55 PM, Dmitriy Igrishin wrote: >> >> Hey Sim, >> >> Maybe this helps: >> >> http://blog.tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_2.html >> >> That worked to find some of them. I still needed to find a bunch of >> others manually, such as 0xa0 and 0xd725 which weren't found with that >> function. I finally figured out that >> select * from emaildetails where emailbody like '%\xa0%' and >> select * from emaildetails where emailbody like '%\xd7\x25%' would show me >> all those rows. >> >> My 2 big problems now are: >> >> A) how to make sure that these chars are not inserted in the future. The >> database should prevent them from being inserted. >> > Consider to use domains -- generic-based types with constraints -- instead > of generic types. > Use regular expressions in constraints. > As alternative, you can use triggers for more complex validation. But > domains in you case > IMO will work good. > > >> B) How to fix the backups that I have so that I can restore them. As I >> mentioned, they are being taken with -Fc >> > Oops. Why not dump fixed database ? > >> >> Sim >> >> >> >> 2010/11/21 Sim Zacks <s...@compulab.co.il> >> >>> I am using PG 8.2.17 with UTF8 encoding. >>> "PostgreSQL 8.2.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 >>> (Gentoo 4.1.1)" >>> >>> One of my tables somehow has invalid characters in it: >>> >>>> ERROR: invalid byte sequence for encoding "UTF8": 0xa9 >>>> HINT: This error can also happen if the byte sequence does not match >>>> the encoding expected by the server, which is controlled by >>>> "client_encoding". >>>> >>> I have already manually found a number of the bad rows by running queries >>> with text functions (upper) between groups of IDs until I found the specific >>> bad row. >>> >>> 1) Is there a quicker way to get a list of all rows with invalid >>> characters >>> 2) Shouldn't the database prevent these rows from being entered in the >>> first place? >>> 3) I have backups of this database (using -Fc) and I noticed that on >>> restore, this table is not restored because of this error. Is there a way to >>> fix the existing backups, or tell the restore to ignore bad rows instead of >>> erroring out the whole table? >>> >>> Thanks >>> Sim >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-general >>> >> >> >> >> -- >> // Dmitriy. >> >> >> >> > > > -- > // Dmitriy. > > > > -- // Dmitriy.