[ADMIN] How to fix bad multibyte data?

Iain Wed, 12 Jan 2005 00:39:30 -0800

Hi All, I have a v7.1 database whose encoding is EUC_JP and I'm trying to get it into a v7.4 database whose encoding is also EUC_JP. Unfortunately it seems that 7.4 is much stricter about it's multibyte data then 7.1 was because attempts to restore into the 7.4 db result in "Invalid byte sequence for encoding"EUC_JP": 0x8e' errors. There is no doubt that the data in the 7.1 database is bad, though I'm not sure exactly how it got that way (the data was loaded by a C program from CSV file). Anyway, I can dump/restore on 7.1 ok. and I can restore into a 7.4 DB with the encoding set to SQL_ASCII but that isn't really what we want. I'm thinking that I may have to put the dump file through some kind of filter that will at least ensure that the data is valid EUC_JP, even if it mangles the data a little by dropping the invalid bytes. The question is, how would one go about this? I think a perl script might do the job (I'm not familiar with perl at all though), but there might be other ways... so before I go off down that path, I'm wondering if anyone has any suggestions. Regards Iain ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

[ADMIN] How to fix bad multibyte data?

Reply via email to