On 23-5-2016 03:45, fabia...@itbizolutions.com.au [firebird-support] wrote: > I have been trying to migrate from FB2.54 into FB 3 for a few weeks, and > after hitting a string related error for some time i have got to the > point where I do understand the issue, but I don't know how to solve it. > The issue is pretty simple, the FB 2.54 DB contains a few characters > that are not allowed into the FB 3 database, one example of a character > causing an error during the restore was "Mcgarrity’s" (note the ’) as it > appears to be outside the scope of the FB3 string domain, I have trying > creating a new FB3 DB with many different charsets but none works. The > other string causing issues is for example "΢ÈíÑźÚ", I have many > records with this type of strings because the DB contains raw emails > received by the system, stored into Varchars, and apparently some emails > contain very weird characters, all were handled by FB2.54 but FB3 > rejects the records. I have been able to isolate all recrods with issues > using IBExpert's table data comparer function, as it created a script > with all recrods from all tables from FB2.54 and when running the script > against FB3.0 it singles out all the offending records. > > Can anyone advise what options I have available to force FB3.0 to accept > any stuff into string fields?
In your other e-mail you indicate you solved this by changing the character set from ASCII to NONE. The fact it worked before was a bug, see http://tracker.firebirdsql.org/browse/CORE-3416. ASCII only supports characters 0-127, characters outside that range are 'extended ascii', eg one of the other singly by character sets like WIN1252 or ISO8859_1. The characters shown (΢ÈíÑÅºÚ and ’) are all outside the ASCII range. The last (’) is particularly nasty, because it should have been a ' (u+0027 Apostrophe, ascii 39), instead u+2019 Right single quotation mark (character 146 in Windows-1252) was used. Given the context of e-mails either NONE or OCTETS is the only real option, as e-mails can have multiple parts with each their own character set, and can also have binary parts (although usually those are encoded with something like base64). Mark -- Mark Rotteveel