Use the Encode module to test/convert back and forth between UTF8 characters 
and bytes for the SQL ASCII database.  Assuming the input is already UTF-8:

use Encode qw(:all);
# connect to db, prepare insert statement, etc.
  my $bytes = encode('utf8', $utf8_text);
  $sth->execute($bytes, $i) or errexit("execute of insert into public_suffixes 
tbl failed: ", $DBI::errstr);

If your input is not already UTF-8, you will have to use decode in an eval 
statement to convert to utf-8, then check for failure before re-converting and 
inserting into the database.  Or something similar.

This seems to work for me.  When I need to pull the data back out of the 
database, I have to reconvert from the byte string into UTF-8 characters before 
displaying the output.

Susan
________________________________
From: [email protected] 
[mailto:[email protected]] On Behalf Of Mike Blackwell
Sent: Thursday, July 21, 2011 7:49 AM
To: [email protected]
Subject: [GENERAL] SQL-ASCII database cleanup

I have an older database that was created with SQL-ASCII encoding.  Over time 
users have managed to enter all manner of interesting characters, mostly via 
cut and paste from Windows documents.  I'm attempting to clean up and 
eventually the database to UTF8.  I've managed to find most of the data that 
won't nicely convert from some-random-encoding to UTF8, but it seems the users 
are entering it as fast as I can find it. Is there a way the incoming data from 
a Perl CGI web application can be automatically limited to UTF8 even though the 
database is SQL-ASCII?


Mike

Reply via email to