Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Thu, Dec 08, 2005 at 05:54:35PM -0500, Gregory Maxwell wrote: > > No, what is needed for people who care about fixing their data is a > > loadable strip_invalid_utf8() that works in older versions.. then just > > select * from bar where foo != strip_invalid_utf8(foo); The function > > would be useful in general, for example, if you have an application > > which doesn't already have much utf8 logic, you want to use a text > > field, and stripping is the behaviour you want. For example, lots of > > simple web applications. > > Would something like the following work? It's written in pl/pgsql and > does (AFAICS) the same checking as the backend in recent releases. > Except the backend only supports up to 4-byte UTF-8 whereas this > function checks upto six byte. For a six byte UTF-8 character, who is > wrong? > > In any case, people should be able to do something like: > > SELECT field FROM table WHERE NOT utf8_verify(field,4); > > To check conformance with PostgreSQL 8.1. Note, I don't have large > chunks of UTF-8 to test with but it works for the characters I tried > with. Tested with 7.4.
I think the problem with any kind of function-call detection is that the data has to get into the database first, and it isn't clear how someone loading a failed dump would do that aside from modifying the column to bytea in the dump, loading it in, then fixing it. The iconv idea has the advantage that it can be fixed before loading into the database. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend