Re: pl/sql question and owa_pattern question
Would extproc_perl fit well enough, though, until 10g is here? On Fri, 21 Nov 2003, Mladen Gogala wrote: > PL/SQL is the fastest thing of them all when it comes to executing > SQL commands, but there are things which simply aren't practical > in 9.2 PL/SQL. Regular expression processing is one of those things. > Fortunately, you can mix the two. Without DBI, perl scripts simply > woudn't be very useful. Of course, there are things that are faster > then even the fastest perl script. Lexer written in C is one of them > and you don't need much work to write one, either, but using OCI is > not easy. OCI is a library written to confuse the enemy, not to help > developer. Using plain and simple regex or PCRE within a C program > is the same thing as above, but slightly more complicated then a lexer. > For the specific task of manipulating patterns and resolving regular > expressions, I use perl almost exclusively because I find it an optimal > tradeoff between ease of use and performance. If performance is a > paramount, as in real time application processing, then you'll have to > resort to C and, possibly, write an external procedure and, thus, > enabling oracle to use C regex calls or even pcre. I was toying with the > idea of enabling oracle to use PCRE but I gave up when I read that 10g > will have that included. > > On 11/21/2003 11:59:31 AM, Guang Mei wrote: > > Perl is a good tool for text processing. But our program is already written > > in pl/sql long time ago and there are intensive db calls in this pl/sql > > program. (text processing is only part of it). So I can not change that. > > > > BTW I did a comparison study a while ago for some of our pl/sql packages > > (specifically for our application). When there are lots of db calls (select, > > insert, update and delete), pl/sql package is faster than correponding perl > > program (I made sure sqls are prepared once and used bind variables in perl. > > All code were executed on the unix server, no other programs were running, > > etc). That's why we stick to pl/sql because our app need the performance. > > Others may have different results, it all depends on what the code does. > > > > Guang > > > > -Original Message- > > Mladen Gogala > > Sent: Thursday, November 20, 2003 11:14 PM > > To: Multiple recipients of list ORACLE-L > > > > > > I don't know about PL/SQL but here is how I would get separate words from a > > big string: > > > > #!/usr/bin/perl -w > > use strict; > > my (@ARR); > > while (<>) { > > chomp; > > @ARR = split(/[^0-9a-zA-Z_\.,<>]/); > > foreach (@ARR) { > > print "$_\n"; > > } > > } > > > > There is something called DBI and it can be used to insert separated words > > into the database, instead > > of printing them. The bottom line is that perl is an excellent tool for > > parsing strings and all sorts of string > > manipulation. > > > > On 2003.11.20 22:39, Guang Mei wrote: > > > Hi: > > > > > > In my pl/sql program, I want to process each "word" in a string. The > > > string is selected from a varchar2(300) column. The delimit that separates > > > the words is not necessary space character. The definition of the delimit > > > in this program is set as > > > > > > 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z) > > > and > > > 2. the character is not one of these: '-.,/<*>_' > > > > > > Now my program is basically checking each character, find the delimit, and > > > rebuild each word. After that I process each "word". The code looks like > > > this: > > > > > > --- > > > str := "This will be a long string with length upto 300 characters, it > > > may contain some invisible characters'; > > > len := length(str)+1; > > > for i in 1..len loop > > > ch := substr(str,i,1); > > > if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1) then > > > if word is not null then > > > -- do some processing to variable word ! > > > word := null;-- reset it > > > end if; > > > else > > > word := word || ch; -- concat ch to word > > > end if; > > > end loop; > > > > > > --- > > > > > > I think It's taking too long because it loops through each characters. I > > > hope I could find a way to speed it up. I don't have experiience in > > > owa_pattern, but I thought there might be a way to do it here: > > > > > > > > > str := "This will be a long string with length upto 300 characters, it > > > may contain some invisible characters'; > > > newstr := str; > > > pos := 1; > > > while pos != 0 loop > > > pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out > > > these '-.,/<*>_' ??? > > > word := substr(newstr, 1, pos-1); > > > -- do some processing to variable word ! > > > if pos != 0 then > > > newstr := substr(newstr, pos+1); > > > end if; > > > end loop; > > > -- > > > > > > My simple tests showed that owa_pattern call is much slower than direct > > > string manupil
Re: pl/sql question and owa_pattern question
PL/SQL is the fastest thing of them all when it comes to executing SQL commands, but there are things which simply aren't practical in 9.2 PL/SQL. Regular expression processing is one of those things. Fortunately, you can mix the two. Without DBI, perl scripts simply woudn't be very useful. Of course, there are things that are faster then even the fastest perl script. Lexer written in C is one of them and you don't need much work to write one, either, but using OCI is not easy. OCI is a library written to confuse the enemy, not to help developer. Using plain and simple regex or PCRE within a C program is the same thing as above, but slightly more complicated then a lexer. For the specific task of manipulating patterns and resolving regular expressions, I use perl almost exclusively because I find it an optimal tradeoff between ease of use and performance. If performance is a paramount, as in real time application processing, then you'll have to resort to C and, possibly, write an external procedure and, thus, enabling oracle to use C regex calls or even pcre. I was toying with the idea of enabling oracle to use PCRE but I gave up when I read that 10g will have that included. On 11/21/2003 11:59:31 AM, Guang Mei wrote: > Perl is a good tool for text processing. But our program is already written > in pl/sql long time ago and there are intensive db calls in this pl/sql > program. (text processing is only part of it). So I can not change that. > > BTW I did a comparison study a while ago for some of our pl/sql packages > (specifically for our application). When there are lots of db calls (select, > insert, update and delete), pl/sql package is faster than correponding perl > program (I made sure sqls are prepared once and used bind variables in perl. > All code were executed on the unix server, no other programs were running, > etc). That's why we stick to pl/sql because our app need the performance. > Others may have different results, it all depends on what the code does. > > Guang > > -Original Message- > Mladen Gogala > Sent: Thursday, November 20, 2003 11:14 PM > To: Multiple recipients of list ORACLE-L > > > I don't know about PL/SQL but here is how I would get separate words from a > big string: > > #!/usr/bin/perl -w > use strict; > my (@ARR); > while (<>) { > chomp; > @ARR = split(/[^0-9a-zA-Z_\.,<>]/); > foreach (@ARR) { > print "$_\n"; > } > } > > There is something called DBI and it can be used to insert separated words > into the database, instead > of printing them. The bottom line is that perl is an excellent tool for > parsing strings and all sorts of string > manipulation. > > On 2003.11.20 22:39, Guang Mei wrote: > > Hi: > > > > In my pl/sql program, I want to process each "word" in a string. The > > string is selected from a varchar2(300) column. The delimit that separates > > the words is not necessary space character. The definition of the delimit > > in this program is set as > > > > 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z) > > and > > 2. the character is not one of these: '-.,/<*>_' > > > > Now my program is basically checking each character, find the delimit, and > > rebuild each word. After that I process each "word". The code looks like > > this: > > > > --- > > str := "This will be a long string with length upto 300 characters, it > > may contain some invisible characters'; > > len := length(str)+1; > > for i in 1..len loop > > ch := substr(str,i,1); > > if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1) then > > if word is not null then > > -- do some processing to variable word ! > > word := null;-- reset it > > end if; > > else > > word := word || ch; -- concat ch to word > > end if; > > end loop; > > > > --- > > > > I think It's taking too long because it loops through each characters. I > > hope I could find a way to speed it up. I don't have experiience in > > owa_pattern, but I thought there might be a way to do it here: > > > > > > str := "This will be a long string with length upto 300 characters, it > > may contain some invisible characters'; > > newstr := str; > > pos := 1; > > while pos != 0 loop > > pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out > > these '-.,/<*>_' ??? > > word := substr(newstr, 1, pos-1); > > -- do some processing to variable word ! > > if pos != 0 then > > newstr := substr(newstr, pos+1); > > end if; > > end loop; > > -- > > > > My simple tests showed that owa_pattern call is much slower than direct > > string manupilation. But I would like to try it in this case if I could > > easily get the "wrods" from the string. Any suggestions? > > > > TIA. > > > > Guang > > > > > > -- > > Please see the official ORACLE-L FAQ: http://www.orafaq.net > > -- > > Author: Guang Mei > > INET: [EMAIL PROTECTED] > > > > Fat City Network Services-- 858-538-5051 ht
RE: pl/sql question and owa_pattern question
Guang, Well you are almost there ... you need fifo structure namely a pl/sql array 1. create a local pl/sql array to store the delimiter (store the ascii value of the delimiter to be safe) my_array (varchar2(5)) 2. as you find a delimiter insert into the first position in the array and replace the delimiting character with # 3. lather.rinse.repeat. when it is time to put it back use a loop nIndex := 0; nPos := 0; loop npos := instr(my_str,'#',1); exit when npos := 0; nIndex := nindex + 1; my_str := substr(my_str,1,nPos-1) || chr(my_array(nIndex)) || sybstr(my_str, nPos+1); end loop; something like this should help, proof-read though ... Raj Rajendra dot Jamadagni at nospamespn dot com All Views expressed in this email are strictly personal. QOTD: Any clod can have facts, having an opinion is an art ! -Original Message- Sent: Friday, November 21, 2003 11:44 AM To: Multiple recipients of list ORACLE-L Hi Stephane: Thanks for your good suggestion. I compared the method you suggested and the orginal one and it indeed boosted the performance (in my simple test). However the ONLY problem I am having is that by doing TRANSLATE, I lost the original delimits. The new method (you suggested) correctly "extract" the "words" (and sent for processing), But after processing I need to put processed-words back to the orginal string with orginal demilters un-changed. I tried to track to position of delimit from the orginal string by doing global_pos := global_pos + pos ; in my "while" loop, but ltrim(substr(string, pos + 1), '#') will make "global_pos" wrong when ltrim trims '#'. Any work-around? TIA. Guang -Original Message- Stephane Faroult Sent: Friday, November 21, 2003 4:19 AM To: Multiple recipients of list ORACLE-L Guang, I agree with your analysis, looping on characters is not the faster you can do, simply because there is a significant overhead (compared to C code for instance) in a language such as PL/SQL - which might be perfectly acceptable in some circumstances, much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, might improve performance. However, in my view the best performance gains you may get is by, so to speak, pushing the bulk of the processing deeper into the kernel (which isn't by the way exclusive of native compiling). Using a function such as INSTR() will be much more efficient than looping on characters. I would suggest something such as : - First use TRANSLATE() to replace all the characters you want to get rid of by a single, well identified character, say # (use CHR() || ... for non printable characters - you can build up the string of characters to translate in the initialisation section of a package rather than typing it). - Start with initializing your string to LTRIM(string, '#') - Then as long as pos := INSTR(string, '#') isn't 0, get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos + 1), '#') to string (very similar to what you were planning to do with owa). This will be probably much faster than a character-by-character loop and calls to an owa package. HTH, Stephane Faroult >- --- Original Message --- - >From: Guang Mei <[EMAIL PROTECTED]> >To: Multiple recipients of list ORACLE-L ><[EMAIL PROTECTED]> >Sent: Thu, 20 Nov 2003 19:39:55 > >Hi: > >In my pl/sql program, I want to process each "word" >in a string. The >string is selected from a varchar2(300) column. The >delimit that separates >the words is not necessary space character. The >definition of the delimit >in this program is set as > >1. Any character that is NOT AlphaNumerical (0-9, >A-Z,a-z) >and >2. the character is not one of these: '-.,/<*>_' > >Now my program is basically checking each >character, find the delimit, and >rebuild each word. After that I process each >"word". The code looks like >this: > >--- >str := "This will be a long string with length >upto 300 characters, it >may contain some invisible characters'; >len := length(str)+1; > for i in 1..len loop >ch := substr(str,i,1); >if (not strings.isAlnum(ch) and >instr('-.,/<*>_', ch)<1) then > if word is not null then >-- do some processing to variable word ! >word := null;-- reset it > end if; >else > word := word || ch; -- concat ch to word >end if; > end loop; > >--- > >I think It's taking too long because it loops >through each characters. I >hope I could find a way to speed it up. I don't >have experiience in >owa_pattern, but I thought there might be a way to >do it here: > > >str := "This will be a long string with length >upto 300 characters, it >may contain some invisible characters'; >newstr := str; >pos := 1; >while pos != 0 loop >pos := owa_pattern.amatch(newstr, 1, '\W'); >-- how can I mask out >these '-.,/<*>_' ??? >word :
RE: pl/sql question and owa_pattern question
Perl is a good tool for text processing. But our program is already written in pl/sql long time ago and there are intensive db calls in this pl/sql program. (text processing is only part of it). So I can not change that. BTW I did a comparison study a while ago for some of our pl/sql packages (specifically for our application). When there are lots of db calls (select, insert, update and delete), pl/sql package is faster than correponding perl program (I made sure sqls are prepared once and used bind variables in perl. All code were executed on the unix server, no other programs were running, etc). That's why we stick to pl/sql because our app need the performance. Others may have different results, it all depends on what the code does. Guang -Original Message- Mladen Gogala Sent: Thursday, November 20, 2003 11:14 PM To: Multiple recipients of list ORACLE-L I don't know about PL/SQL but here is how I would get separate words from a big string: #!/usr/bin/perl -w use strict; my (@ARR); while (<>) { chomp; @ARR = split(/[^0-9a-zA-Z_\.,<>]/); foreach (@ARR) { print "$_\n"; } } There is something called DBI and it can be used to insert separated words into the database, instead of printing them. The bottom line is that perl is an excellent tool for parsing strings and all sorts of string manipulation. On 2003.11.20 22:39, Guang Mei wrote: > Hi: > > In my pl/sql program, I want to process each "word" in a string. The > string is selected from a varchar2(300) column. The delimit that separates > the words is not necessary space character. The definition of the delimit > in this program is set as > > 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z) > and > 2. the character is not one of these: '-.,/<*>_' > > Now my program is basically checking each character, find the delimit, and > rebuild each word. After that I process each "word". The code looks like > this: > > --- > str := "This will be a long string with length upto 300 characters, it > may contain some invisible characters'; > len := length(str)+1; > for i in 1..len loop > ch := substr(str,i,1); > if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1) then > if word is not null then > -- do some processing to variable word ! > word := null;-- reset it > end if; > else > word := word || ch; -- concat ch to word > end if; > end loop; > > --- > > I think It's taking too long because it loops through each characters. I > hope I could find a way to speed it up. I don't have experiience in > owa_pattern, but I thought there might be a way to do it here: > > > str := "This will be a long string with length upto 300 characters, it > may contain some invisible characters'; > newstr := str; > pos := 1; > while pos != 0 loop > pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out > these '-.,/<*>_' ??? > word := substr(newstr, 1, pos-1); > -- do some processing to variable word ! > if pos != 0 then > newstr := substr(newstr, pos+1); > end if; > end loop; > -- > > My simple tests showed that owa_pattern call is much slower than direct > string manupilation. But I would like to try it in this case if I could > easily get the "wrods" from the string. Any suggestions? > > TIA. > > Guang > > > -- > Please see the official ORACLE-L FAQ: http://www.orafaq.net > -- > Author: Guang Mei > INET: [EMAIL PROTECTED] > > Fat City Network Services-- 858-538-5051 http://www.fatcity.com > San Diego, California-- Mailing list and web hosting services > - > To REMOVE yourself from this mailing list, send an E-Mail message > to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in > the message BODY, include a line containing: UNSUB ORACLE-L > (or the name of mailing list you want to be removed from). You may > also send the HELP command for other information (like subscribing). > -- Mladen Gogala Oracle DBA -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Mladen Gogala INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Guang Mei INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services -
RE: pl/sql question and owa_pattern question
Hi Stephane: Thanks for your good suggestion. I compared the method you suggested and the orginal one and it indeed boosted the performance (in my simple test). However the ONLY problem I am having is that by doing TRANSLATE, I lost the original delimits. The new method (you suggested) correctly "extract" the "words" (and sent for processing), But after processing I need to put processed-words back to the orginal string with orginal demilters un-changed. I tried to track to position of delimit from the orginal string by doing global_pos := global_pos + pos ; in my "while" loop, but ltrim(substr(string, pos + 1), '#') will make "global_pos" wrong when ltrim trims '#'. Any work-around? TIA. Guang -Original Message- Stephane Faroult Sent: Friday, November 21, 2003 4:19 AM To: Multiple recipients of list ORACLE-L Guang, I agree with your analysis, looping on characters is not the faster you can do, simply because there is a significant overhead (compared to C code for instance) in a language such as PL/SQL - which might be perfectly acceptable in some circumstances, much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, might improve performance. However, in my view the best performance gains you may get is by, so to speak, pushing the bulk of the processing deeper into the kernel (which isn't by the way exclusive of native compiling). Using a function such as INSTR() will be much more efficient than looping on characters. I would suggest something such as : - First use TRANSLATE() to replace all the characters you want to get rid of by a single, well identified character, say # (use CHR() || ... for non printable characters - you can build up the string of characters to translate in the initialisation section of a package rather than typing it). - Start with initializing your string to LTRIM(string, '#') - Then as long as pos := INSTR(string, '#') isn't 0, get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos + 1), '#') to string (very similar to what you were planning to do with owa). This will be probably much faster than a character-by-character loop and calls to an owa package. HTH, Stephane Faroult >- --- Original Message --- - >From: Guang Mei <[EMAIL PROTECTED]> >To: Multiple recipients of list ORACLE-L ><[EMAIL PROTECTED]> >Sent: Thu, 20 Nov 2003 19:39:55 > >Hi: > >In my pl/sql program, I want to process each "word" >in a string. The >string is selected from a varchar2(300) column. The >delimit that separates >the words is not necessary space character. The >definition of the delimit >in this program is set as > >1. Any character that is NOT AlphaNumerical (0-9, >A-Z,a-z) >and >2. the character is not one of these: '-.,/<*>_' > >Now my program is basically checking each >character, find the delimit, and >rebuild each word. After that I process each >"word". The code looks like >this: > >--- >str := "This will be a long string with length >upto 300 characters, it >may contain some invisible characters'; >len := length(str)+1; > for i in 1..len loop >ch := substr(str,i,1); >if (not strings.isAlnum(ch) and >instr('-.,/<*>_', ch)<1) then > if word is not null then >-- do some processing to variable word ! >word := null;-- reset it > end if; >else > word := word || ch; -- concat ch to word >end if; > end loop; > >--- > >I think It's taking too long because it loops >through each characters. I >hope I could find a way to speed it up. I don't >have experiience in >owa_pattern, but I thought there might be a way to >do it here: > > >str := "This will be a long string with length >upto 300 characters, it >may contain some invisible characters'; >newstr := str; >pos := 1; >while pos != 0 loop >pos := owa_pattern.amatch(newstr, 1, '\W'); >-- how can I mask out >these '-.,/<*>_' ??? >word := substr(newstr, 1, pos-1); >-- do some processing to variable word ! >if pos != 0 then > newstr := substr(newstr, pos+1); >end if; >end loop; >-- > >My simple tests showed that owa_pattern call is >much slower than direct >string manupilation. But I would like to try it in >this case if I could >easily get the "wrods" from the string. Any >suggestions? > >TIA. > >Guang > -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information
RE: pl/sql question and owa_pattern question
Guang, I agree with your analysis, looping on characters is not the faster you can do, simply because there is a significant overhead (compared to C code for instance) in a language such as PL/SQL - which might be perfectly acceptable in some circumstances, much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, might improve performance. However, in my view the best performance gains you may get is by, so to speak, pushing the bulk of the processing deeper into the kernel (which isn't by the way exclusive of native compiling). Using a function such as INSTR() will be much more efficient than looping on characters. I would suggest something such as : - First use TRANSLATE() to replace all the characters you want to get rid of by a single, well identified character, say # (use CHR() || ... for non printable characters - you can build up the string of characters to translate in the initialisation section of a package rather than typing it). - Start with initializing your string to LTRIM(string, '#') - Then as long as pos := INSTR(string, '#') isn't 0, get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos + 1), '#') to string (very similar to what you were planning to do with owa). This will be probably much faster than a character-by-character loop and calls to an owa package. HTH, Stephane Faroult >- --- Original Message --- - >From: Guang Mei <[EMAIL PROTECTED]> >To: Multiple recipients of list ORACLE-L ><[EMAIL PROTECTED]> >Sent: Thu, 20 Nov 2003 19:39:55 > >Hi: > >In my pl/sql program, I want to process each "word" >in a string. The >string is selected from a varchar2(300) column. The >delimit that separates >the words is not necessary space character. The >definition of the delimit >in this program is set as > >1. Any character that is NOT AlphaNumerical (0-9, >A-Z,a-z) >and >2. the character is not one of these: '-.,/<*>_' > >Now my program is basically checking each >character, find the delimit, and >rebuild each word. After that I process each >"word". The code looks like >this: > >--- >str := "This will be a long string with length >upto 300 characters, it >may contain some invisible characters'; >len := length(str)+1; > for i in 1..len loop >ch := substr(str,i,1); >if (not strings.isAlnum(ch) and >instr('-.,/<*>_', ch)<1) then > if word is not null then >-- do some processing to variable word ! >word := null;-- reset it > end if; >else > word := word || ch; -- concat ch to word >end if; > end loop; > >--- > >I think It's taking too long because it loops >through each characters. I >hope I could find a way to speed it up. I don't >have experiience in >owa_pattern, but I thought there might be a way to >do it here: > > >str := "This will be a long string with length >upto 300 characters, it >may contain some invisible characters'; >newstr := str; >pos := 1; >while pos != 0 loop >pos := owa_pattern.amatch(newstr, 1, '\W'); >-- how can I mask out >these '-.,/<*>_' ??? >word := substr(newstr, 1, pos-1); >-- do some processing to variable word ! >if pos != 0 then > newstr := substr(newstr, pos+1); >end if; >end loop; >-- > >My simple tests showed that owa_pattern call is >much slower than direct >string manupilation. But I would like to try it in >this case if I could >easily get the "wrods" from the string. Any >suggestions? > >TIA. > >Guang > -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Stephane Faroult INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: pl/sql question and owa_pattern question
I don't know about PL/SQL but here is how I would get separate words from a big string: #!/usr/bin/perl -w use strict; my (@ARR); while (<>) { chomp; @ARR = split(/[^0-9a-zA-Z_\.,<>]/); foreach (@ARR) { print "$_\n"; } } There is something called DBI and it can be used to insert separated words into the database, instead of printing them. The bottom line is that perl is an excellent tool for parsing strings and all sorts of string manipulation. On 2003.11.20 22:39, Guang Mei wrote: > Hi: > > In my pl/sql program, I want to process each "word" in a string. The > string is selected from a varchar2(300) column. The delimit that separates > the words is not necessary space character. The definition of the delimit > in this program is set as > > 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z) > and > 2. the character is not one of these: '-.,/<*>_' > > Now my program is basically checking each character, find the delimit, and > rebuild each word. After that I process each "word". The code looks like > this: > > --- > str := "This will be a long string with length upto 300 characters, it > may contain some invisible characters'; > len := length(str)+1; > for i in 1..len loop > ch := substr(str,i,1); > if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1) then > if word is not null then > -- do some processing to variable word ! > word := null;-- reset it > end if; > else > word := word || ch; -- concat ch to word > end if; > end loop; > > --- > > I think It's taking too long because it loops through each characters. I > hope I could find a way to speed it up. I don't have experiience in > owa_pattern, but I thought there might be a way to do it here: > > > str := "This will be a long string with length upto 300 characters, it > may contain some invisible characters'; > newstr := str; > pos := 1; > while pos != 0 loop > pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out > these '-.,/<*>_' ??? > word := substr(newstr, 1, pos-1); > -- do some processing to variable word ! > if pos != 0 then > newstr := substr(newstr, pos+1); > end if; > end loop; > -- > > My simple tests showed that owa_pattern call is much slower than direct > string manupilation. But I would like to try it in this case if I could > easily get the "wrods" from the string. Any suggestions? > > TIA. > > Guang > > > -- > Please see the official ORACLE-L FAQ: http://www.orafaq.net > -- > Author: Guang Mei > INET: [EMAIL PROTECTED] > > Fat City Network Services-- 858-538-5051 http://www.fatcity.com > San Diego, California-- Mailing list and web hosting services > - > To REMOVE yourself from this mailing list, send an E-Mail message > to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in > the message BODY, include a line containing: UNSUB ORACLE-L > (or the name of mailing list you want to be removed from). You may > also send the HELP command for other information (like subscribing). > -- Mladen Gogala Oracle DBA -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Mladen Gogala INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
pl/sql question and owa_pattern question
Hi: In my pl/sql program, I want to process each "word" in a string. The string is selected from a varchar2(300) column. The delimit that separates the words is not necessary space character. The definition of the delimit in this program is set as 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z) and 2. the character is not one of these: '-.,/<*>_' Now my program is basically checking each character, find the delimit, and rebuild each word. After that I process each "word". The code looks like this: --- str := "This will be a long string with length upto 300 characters, it may contain some invisible characters'; len := length(str)+1; for i in 1..len loop ch := substr(str,i,1); if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1) then if word is not null then -- do some processing to variable word ! word := null;-- reset it end if; else word := word || ch; -- concat ch to word end if; end loop; --- I think It's taking too long because it loops through each characters. I hope I could find a way to speed it up. I don't have experiience in owa_pattern, but I thought there might be a way to do it here: str := "This will be a long string with length upto 300 characters, it may contain some invisible characters'; newstr := str; pos := 1; while pos != 0 loop pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out these '-.,/<*>_' ??? word := substr(newstr, 1, pos-1); -- do some processing to variable word ! if pos != 0 then newstr := substr(newstr, pos+1); end if; end loop; -- My simple tests showed that owa_pattern call is much slower than direct string manupilation. But I would like to try it in this case if I could easily get the "wrods" from the string. Any suggestions? TIA. Guang -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Guang Mei INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).