RE: Find an unprintable character inside a column....
Just a brief foot-note to this discussion. The reason I selected the data, rather than attempted an automatic correction, was that sometimes two words would be separated by ascii(10) (forcing a line throw), and sometimes the corruption would appear as an additional spurious character in a line. In the former case, one wants to replace ascii(10) with a space (ascii 32), but in the latter case one simply wants to remove the corrupt data character. OK, yes, I could do it with decode, and I'm sure Jared will have the appropriate Perl script out before I finish typing this But but but... peter edinburgh -Original Message- From: Stephane Faroult [mailto:[EMAIL PROTECTED] Sent: Friday, October 10, 2003 10:54 PM To: Multiple recipients of list ORACLE-L Subject: Re: Find an unprintable character inside a column [EMAIL PROTECTED] wrote: I played with this a bit. First, I created some test data with one column corrupted with a single random character of 0-31 replacing a random char in that column 20% of the rows of the table. Peter's function correctly found all of the rows in 7.5 seconds. Stephane's function ran in 3.5 seconds, but didn't find any of the rows. I didn't attempt to correct the code. Then I tried a function based on owa_pattern.regex. My initial attempts didn't return the correct rows, as the regex pattern needed some tuning. I didn't attempt to fix it, as it was woefully slow, about 30 seconds. Regex in the WHERE clause in 10g will be nice. Jared Stephane Faroult [EMAIL PROTECTED] To:Multiple Sent by: [EMAIL PROTECTED] recipients of list ORACLE-L [EMAIL PROTECTED] 10/10/2003 07:09 AM cc: Please respond to ORACLE-LSubject:RE: RE: RE: Find an unprintable character inside a column Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Jared, you're the scourge of people who just write things out of the top of their head and don't test them thoroughly :-). I had made my usual mistake
RE: Find an unprintable character inside a column....
perl -pe s/[[:cntrl:]]//g file_with_ctrl_chars file_without_control_chars Robson, Peter [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/13/2003 03:49 AM Please respond to ORACLE-L To:Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: Subject:RE: Find an unprintable character inside a column Just a brief foot-note to this discussion. The reason I selected the data, rather than attempted an automatic correction, was that sometimes two words would be separated by ascii(10) (forcing a line throw), and sometimes the corruption would appear as an additional spurious character in a line. In the former case, one wants to replace ascii(10) with a space (ascii 32), but in the latter case one simply wants to remove the corrupt data character. OK, yes, I could do it with decode, and I'm sure Jared will have the appropriate Perl script out before I finish typing this But but but... peter edinburgh -Original Message- From: Stephane Faroult [mailto:[EMAIL PROTECTED] Sent: Friday, October 10, 2003 10:54 PM To: Multiple recipients of list ORACLE-L Subject: Re: Find an unprintable character inside a column [EMAIL PROTECTED] wrote: I played with this a bit. First, I created some test data with one column corrupted with a single random character of 0-31 replacing a random char in that column 20% of the rows of the table. Peter's function correctly found all of the rows in 7.5 seconds. Stephane's function ran in 3.5 seconds, but didn't find any of the rows. I didn't attempt to correct the code. Then I tried a function based on owa_pattern.regex. My initial attempts didn't return the correct rows, as the regex pattern needed some tuning. I didn't attempt to fix it, as it was woefully slow, about 30 seconds. Regex in the WHERE clause in 10g will be nice. Jared Stephane Faroult [EMAIL PROTECTED] To:Multiple Sent by: [EMAIL PROTECTED] recipients of list ORACLE-L [EMAIL PROTECTED] 10/10/2003 07:09 AMcc: Please respond to ORACLE-LSubject:RE: RE: RE: Find an unprintable character inside a column Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Jared, you're the scourge of people who just write things out
RE: RE: Find an unprintable character inside a column....
Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / -- -Original Message- From: Prem Khanna J [mailto:[EMAIL PROTECTED] Sent: Thursday, October 09, 2003 10:49 AM To: Multiple recipients of list ORACLE-L Subject: Re: RE: Find an unprintable character inside a column Peter, i would be interested in that. can u mail it to me ? Jp. 09-10-2003 18:29:33, Robson, Peter [EMAIL PROTECTED] wrote: I have a small PL/SQL piece of code used to detect these things, if anyone wants it. -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Prem Khanna J INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). * This e-mail message, and any files transmitted with it, are confidential and intended solely for the use of the addressee. If this message was not addressed to you, you have received it in error and any copying, distribution or other use of any part of it is strictly prohibited. Any views or opinions presented are solely those of the sender and do not necessarily represent those of the British Geological Survey. The security of e-mail communication cannot be guaranteed and the BGS accepts no liability for claims arising as a result of the use of this medium to transmit messages from or to the BGS. .http://www.bgs.ac.uk * -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Robson, Peter INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: RE: RE: Find an unprintable character inside a column....
Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Regards, Stephane Faroult Oriole -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: RE: RE: Find an unprintable character inside a column....
I played with this a bit. First, I created some test data with one column corrupted with a single random character of 0-31 replacing a random char in that column 20% of the rows of the table. Peter's function correctly found all of the rows in 7.5 seconds. Stephane's function ran in 3.5 seconds, but didn't find any of the rows. I didn't attempt to correct the code. Then I tried a function based on owa_pattern.regex. My initial attempts didn't return the correct rows, as the regex pattern needed some tuning. I didn't attempt to fix it, as it was woefully slow, about 30 seconds. Regex in the WHERE clause in 10g will be nice. Jared Stephane Faroult [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/10/2003 07:09 AM Please respond to ORACLE-L To:Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: Subject:RE: RE: RE: Find an unprintable character inside a column Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Regards, Stephane Faroult Oriole -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: RE: RE: Find an unprintable character inside a column....
Actually, I was toying with the idea of writing an external procedure that would allow me to call pcre library (PCRE=Perl Compatible Regular Expressions) which would be nice, but then again, the whole perl is available through the set of external procedures, so it wouldn't be very useful. External procedures can be used in the where clause, provided they're declared as deterministic. Actually, it wouldn't be that hard to extend 9.2 database with regular expressions. On Fri, 2003-10-10 at 15:39, [EMAIL PROTECTED] wrote: I played with this a bit. First, I created some test data with one column corrupted with a single random character of 0-31 replacing a random char in that column 20% of the rows of the table. Peter's function correctly found all of the rows in 7.5 seconds. Stephane's function ran in 3.5 seconds, but didn't find any of the rows. I didn't attempt to correct the code. Then I tried a function based on owa_pattern.regex. My initial attempts didn't return the correct rows, as the regex pattern needed some tuning. I didn't attempt to fix it, as it was woefully slow, about 30 seconds. Regex in the WHERE clause in 10g will be nice. Jared Stephane Faroult [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/10/2003 07:09 AM Please respond to ORACLE-L To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: Subject: RE: RE: RE: Find an unprintable character inside a column Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Regards, Stephane Faroult Oriole -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Mladen Gogala Oracle DBA
RE: RE: RE: Find an unprintable character inside a column....
Definitely worth trying if you have a need for it. I don't, and it's more work than I want to do just because I can. Mladen Gogala [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/10/2003 01:44 PM Please respond to ORACLE-L To:Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: Subject:RE: RE: RE: Find an unprintable character inside a column Actually, I was toying with the idea of writing an external procedure that would allow me to call pcre library (PCRE=Perl Compatible Regular Expressions) which would be nice, but then again, the whole perl is available through the set of external procedures, so it wouldn't be very useful. External procedures can be used in the where clause, provided they're declared as deterministic. Actually, it wouldn't be that hard to extend 9.2 database with regular expressions. On Fri, 2003-10-10 at 15:39, [EMAIL PROTECTED] wrote: I played with this a bit. First, I created some test data with one column corrupted with a single random character of 0-31 replacing a random char in that column 20% of the rows of the table. Peter's function correctly found all of the rows in 7.5 seconds. Stephane's function ran in 3.5 seconds, but didn't find any of the rows. I didn't attempt to correct the code. Then I tried a function based on owa_pattern.regex. My initial attempts didn't return the correct rows, as the regex pattern needed some tuning. I didn't attempt to fix it, as it was woefully slow, about 30 seconds. Regex in the WHERE clause in 10g will be nice. Jared Stephane Faroult [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/10/2003 07:09 AM Please respond to ORACLE-L To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: Subject: RE: RE: RE: Find an unprintable character inside a column Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Regards, Stephane Faroult Oriole -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY
Re: Find an unprintable character inside a column....
[EMAIL PROTECTED] wrote: I played with this a bit. First, I created some test data with one column corrupted with a single random character of 0-31 replacing a random char in that column 20% of the rows of the table. Peter's function correctly found all of the rows in 7.5 seconds. Stephane's function ran in 3.5 seconds, but didn't find any of the rows. I didn't attempt to correct the code. Then I tried a function based on owa_pattern.regex. My initial attempts didn't return the correct rows, as the regex pattern needed some tuning. I didn't attempt to fix it, as it was woefully slow, about 30 seconds. Regex in the WHERE clause in 10g will be nice. Jared Stephane Faroult [EMAIL PROTECTED] To:Multiple Sent by: [EMAIL PROTECTED] recipients of list ORACLE-L [EMAIL PROTECTED] 10/10/2003 07:09 AM cc: Please respond to ORACLE-LSubject:RE: RE: RE: Find an unprintable character inside a column Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Jared, you're the scourge of people who just write things out of the top of their head and don't test them thoroughly :-). I had made my usual mistake of using REPLACE instead of TRANSLATE. Just tried it with 'regular' data, since this is the only case where it can be faster that Peter's routine. Works like Peter's routine with TRANSLATE, only somewhat faster. Ooops again. -- Regards, Stephane Faroult Oriole Software -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Find an unprintable character inside a column....
DROP TABLE table_1; CREATE TABLE table_1(data VARCHAR2(10)); INSERT INTO table_1 VALUES(CHR(1)||'ABC'); INSERT INTO table_1 VALUES('ABC'||CHR(25)); INSERT INTO table_1 VALUES(CHR(25)||'@'||CHR(30)); INSERT INTO table_1 VALUES(CHR(25)||'@'||CHR(31)); INSERT INTO table_1 VALUES('ABC'); COMMIT; VARIABLE npc VARCHAR2(33); VARIABLE np0 VARCHAR2(33); VARIABLE np VARCHAR2(33); BEGIN :npc := ''; :np0 := ''; :np := ''; FOR i IN 0 .. 31 LOOP :npc := :npc || CHR(i); :np0 := :np0 || CHR(0); END LOOP; :np := '@' || :npc; END; / COLUMN data FORMAT A10 COLUMN dump FORMAT A30 SELECT ROWID , data , DUMP(data) dump , LENGTH(data) - LENGTH(TRANSLATE(data, :np, '@')) numer_of_np_chars , INSTR(TRANSLATE(data, :npc, :np0), CHR(0)) first_position FROM table_1 WHERE TRANSLATE(data, :np, '@') data / It's not for unicode. FBI could be used as well. -- Vladimir Begun The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation. Robson, Peter wrote: Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Vladimir Begun INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Find an unprintable character inside a column....
See notes, 113827.1, 119426.1, 154880.1. Could be done and done, but not to solve this particular task -- it would be an overkill. :) -- Vladimir Begun The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation. [EMAIL PROTECTED] wrote: Definitely worth trying if you have a need for it. I don't, and it's more work than I want to do just because I can. *Mladen Gogala* Actually, I was toying with the idea of writing an external procedure that would allow me to call pcre library (PCRE=Perl Compatible Regular Expressions) which would be nice, but then again, the whole perl is available through the set of external procedures, so it wouldn't be very useful. External procedures can be used in the where clause, provided they're declared as deterministic. Actually, it wouldn't be that hard to extend 9.2 database with regular expressions. -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Vladimir Begun INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Find an unprintable character inside a column....
Always glad to be of service. It works with translate(), about 53% faster. Stephane Faroult [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/10/2003 02:54 PM Please respond to ORACLE-L To:Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: Subject:Re: Find an unprintable character inside a column [EMAIL PROTECTED] wrote: I played with this a bit. First, I created some test data with one column corrupted with a single random character of 0-31 replacing a random char in that column 20% of the rows of the table. Peter's function correctly found all of the rows in 7.5 seconds. Stephane's function ran in 3.5 seconds, but didn't find any of the rows. I didn't attempt to correct the code. Then I tried a function based on owa_pattern.regex. My initial attempts didn't return the correct rows, as the regex pattern needed some tuning. I didn't attempt to fix it, as it was woefully slow, about 30 seconds. Regex in the WHERE clause in 10g will be nice. Jared Stephane Faroult [EMAIL PROTECTED] To:Multiple Sent by: [EMAIL PROTECTED] recipients of list ORACLE-L [EMAIL PROTECTED] 10/10/2003 07:09 AMcc: Please respond to ORACLE-LSubject:RE: RE: RE: Find an unprintable character inside a column Some people have requested this code, so I thought you might as well all have the chance to pick it to bits... Its a function called BAD_ASCII, and it hunts out for any ascii characters with an ascii value of less than 32 in a specified field. (Acknowledgments to my colleague Keith Holmes for help with this code.) Use it as follows: Where a field called DATA in a table TABLE_1 may contain an ascci character with a value less than 32 (ie a non-printing character), the following SQL will find the row in question: select rowid,DATA,dump(DATA) from TABLE_1 where BAD_ASCII(DATA) 0; You could use the PK of the table instead of rowid, of course. You will also note that I select the DATA field in both normal and ascii 'dump' mode, the better to locate where the corruption is located. peter edinburgh ... Source as follows: Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin -- V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; -- exception when others then return -1; end BAD_ASCII; / Peter, I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it. Here is what I would suggest : create or replace Function BAD_ASCII (V_Text in char) return number is V_Int number; V_Count number; begin if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)|| chr(4)||chr(5)||chr(6)||chr(7)|| chr(8)||chr(9)||chr(10)||chr(11)|| chr(12)||chr(13)||chr(14)||chr(15)|| chr(16)||chr(17)||chr(18)||chr(19)|| chr(20)||chr(21)||chr(22)||chr(23)|| chr(24)||chr(25)||chr(26)||chr(27)|| chr(28)||chr(29)||chr(30)||chr(31), '') = V_text) then return 0; else V_Int := 0; V_Count := 1; while V_Count=length(rtrim(V_Text)) and V_Int=0 loop if ascii(substr(V_Text, V_Count, 1))32 then V_Int := V_Count; end if; V_Count := V_Count + 1; end loop; return V_Int; end if; -- exception when others then return -1; end BAD_ASCII; / Jared, you're the scourge of people who just write things out of the top of their head and don't test them thoroughly :-). I had made my usual mistake of using REPLACE instead of TRANSLATE. Just tried it with 'regular' data, since this is the only case where it can be faster that Peter's routine. Works like Peter's routine with TRANSLATE, only somewhat faster.
RE: Find an unprintable character inside a column....
Steve, If you are patient, I guess that something like where dump(problem_column) like '%target hex%' should more or less answer your question. HTH SF - --- Original Message --- - From: Steve Main [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Sent: Wed, 08 Oct 2003 15:44:26 Hello list, I have an application that is choking on the following error, ...invalid character (Unicode: 0x19) was found in the element content... Does anyone know how I could go about searching for this invalid character? Thanks Steve -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Find an unprintable character inside a column....
Yes, exactly Stephane - Non-printable characters like this are a proper pest in our environment, to the extent that I have exception reports running every night looking for them (cannot trust the users...). I have a small PL/SQL piece of code used to detect these things, if anyone wants it. peter edinburgh -Original Message- Sent: Thursday, October 09, 2003 9:54 AM To: Multiple recipients of list ORACLE-L Steve, If you are patient, I guess that something like where dump(problem_column) like '%target hex%' should more or less answer your question. HTH SF - --- Original Message --- - From: Steve Main [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Sent: Wed, 08 Oct 2003 15:44:26 Hello list, I have an application that is choking on the following error, ...invalid character (Unicode: 0x19) was found in the element content... Does anyone know how I could go about searching for this invalid character? Thanks Steve -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). * This e-mail message, and any files transmitted with it, are confidential and intended solely for the use of the addressee. If this message was not addressed to you, you have received it in error and any copying, distribution or other use of any part of it is strictly prohibited. Any views or opinions presented are solely those of the sender and do not necessarily represent those of the British Geological Survey. The security of e-mail communication cannot be guaranteed and the BGS accepts no liability for claims arising as a result of the use of this medium to transmit messages from or to the BGS. .http://www.bgs.ac.uk * -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Robson, Peter INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: RE: Find an unprintable character inside a column....
Peter, i would be interested in that. can u mail it to me ? Jp. 09-10-2003 18:29:33, Robson, Peter [EMAIL PROTECTED] wrote: I have a small PL/SQL piece of code used to detect these things, if anyone wants it. -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Prem Khanna J INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Find an unprintable character inside a column....
Peter, i would be interested in that. can you mail it to me ? Dias Costa Robson, Peter wrote: Yes, exactly Stephane - Non-printable characters like this are a proper pest in our environment, to the extent that I have exception reports running every night looking for them (cannot trust the users...). I have a small PL/SQL piece of code used to detect these things, if anyone wants it. peter edinburgh -Original Message- Sent: Thursday, October 09, 2003 9:54 AM To: Multiple recipients of list ORACLE-L Steve, If you are patient, I guess that something like where dump(problem_column) like '%target hex%' should more or less answer your question. HTH SF - --- Original Message --- - From: Steve Main [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Sent: Wed, 08 Oct 2003 15:44:26 Hello list, I have an application that is choking on the following error, ...invalid character (Unicode: 0x19) was found in the element content... Does anyone know how I could go about searching for this invalid character? Thanks Steve -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Dias Costa INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).