Re: [GENERAL] [JDBC] Invalid Character Data Problem

2004-11-29 Thread Kris Jurka


On Fri, 26 Nov 2004, Hunter Hillegas wrote:

 When I SELECT from a certain table, I see this JDBC exception:
 
 Invalid character data was found.  This is most likely caused by stored
 data containing characters that are invalid for the character set the
 database was created in.  The most common example of this is storing 8bit
 data in a SQL_ASCII database.
 
 The database is indeed of type SQL_ASCII. The table stores mailing list data
 and has about 400,000 rows.
 
 Looking at the data via psql, I see that some of the rows have strange
 characters in them, such as question marks where I would not expect them,
 etc...
 
 What are my options? Is there a way to identify the 'bad' records, or the
 ones causing trouble?
 

To really solve this problem you need to have a correctly encoded 
database.  This will involve a dump and restore process and possibly 
recoding your data.  This is straightforward if you know what 
encoding your data is, although it will cause some downtime.

To detect the bad data you can try various SELECTs with the JDBC driver 
and see what errors out.  The function below will determine if a 
particular field has data with the high bit set which is something the 
database really doesn't know what to do with.

SELECT pkcolumn, hashighbit(columna), hashighbit(columnb) FROM mytable;

Kris Jurka

CREATE OR REPLACE FUNCTION hashighbit(text) RETURNS boolean AS '
DECLARE
i int;
BEGIN
i := LENGTH($1);
WHILE i  0 LOOP
IF ascii(substring($1, i, 1)) = 128 THEN
RETURN true;
END IF;
i := i-1;
END LOOP;
RETURN false;
END;
' LANGUAGE 'plpgsql';


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] [JDBC] Invalid Character Data Problem

2004-11-29 Thread Hunter Hillegas
I got this going with a dump/reload.

Beware however, 8.0b5 and the JDBC driver don't seem to work with UNICODE
encoding for the database. You have to use b4 or HEAD.

Hunter

 From: Kris Jurka [EMAIL PROTECTED]
 Date: Tue, 30 Nov 2004 01:34:46 -0500 (EST)
 To: Hunter Hillegas [EMAIL PROTECTED]
 Cc: PostgreSQL [EMAIL PROTECTED], Postgre JDBC List
 [EMAIL PROTECTED]
 Subject: Re: [JDBC] Invalid Character Data Problem
 
 
 
 On Fri, 26 Nov 2004, Hunter Hillegas wrote:
 
 When I SELECT from a certain table, I see this JDBC exception:
 
 Invalid character data was found.  This is most likely caused by stored
 data containing characters that are invalid for the character set the
 database was created in.  The most common example of this is storing 8bit
 data in a SQL_ASCII database.
 
 The database is indeed of type SQL_ASCII. The table stores mailing list data
 and has about 400,000 rows.
 
 Looking at the data via psql, I see that some of the rows have strange
 characters in them, such as question marks where I would not expect them,
 etc...
 
 What are my options? Is there a way to identify the 'bad' records, or the
 ones causing trouble?
 
 
 To really solve this problem you need to have a correctly encoded
 database.  This will involve a dump and restore process and possibly
 recoding your data.  This is straightforward if you know what
 encoding your data is, although it will cause some downtime.
 
 To detect the bad data you can try various SELECTs with the JDBC driver
 and see what errors out.  The function below will determine if a
 particular field has data with the high bit set which is something the
 database really doesn't know what to do with.
 
 SELECT pkcolumn, hashighbit(columna), hashighbit(columnb) FROM mytable;
 
 Kris Jurka
 
 CREATE OR REPLACE FUNCTION hashighbit(text) RETURNS boolean AS '
 DECLARE
 i int;
 BEGIN
 i := LENGTH($1);
 WHILE i  0 LOOP
 IF ascii(substring($1, i, 1)) = 128 THEN
 RETURN true;
 END IF;
 i := i-1;
 END LOOP;
 RETURN false;
 END;
 ' LANGUAGE 'plpgsql';
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]