On 12/4/05, Tom Lane <[EMAIL PROTECTED]> wrote: > Paul Lindner <[EMAIL PROTECTED]> writes: > > On Sun, Dec 04, 2005 at 11:34:16AM -0500, Tom Lane wrote: > >> Paul Lindner <[EMAIL PROTECTED]> writes: > >>> iconv -c -f UTF8 -t UTF8 -o fixed.sql dump.sql > >> > >> Is that really a one-size-fits-all solution? Especially with -c? > > > I'd say yes, and the -c flag is needed so iconv strips out the > > invalid characters. > > That's exactly what's bothering me about it. If we recommend that > we had better put a large THIS WILL DESTROY YOUR DATA warning first. > The problem is that the data is not "invalid" from the user's point > of view --- more likely, it's in some non-UTF8 encoding --- and so > just throwing away some of the characters is unlikely to make people > happy.
Nor is it even guarenteed to make the data load: If the column is unique constrained and the removal of the non-UTF characters makes two rows have the same data where they didn't before... The way to preserve the data is to switch the column to be a bytea. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster