Re: pl/sql question and owa_pattern question

2003-11-21 Thread Daniel Hanks
Would extproc_perl fit well enough, though, until 10g is here?

On Fri, 21 Nov 2003, Mladen Gogala wrote:

> PL/SQL is the fastest thing of them all when it comes to executing 
> SQL commands, but there are things which simply aren't practical 
> in 9.2 PL/SQL. Regular expression processing is one of those things.
> Fortunately, you can mix the two. Without DBI, perl scripts simply
> woudn't be very useful. Of course, there are things that are faster
> then even the fastest perl script. Lexer written in C is one of them
> and you don't need much work to write one, either, but using OCI is
> not easy. OCI is a library written to confuse the enemy, not to help
> developer. Using plain and simple regex or PCRE within a C program
> is the same thing as above, but slightly more complicated then a lexer.
> For the specific task of manipulating patterns and resolving regular
> expressions, I use perl almost exclusively because I find it an optimal 
> tradeoff between ease of use and performance. If performance is a 
> paramount, as in real time application processing, then you'll have to 
> resort to C and, possibly, write an external procedure and, thus,
> enabling oracle to use C regex calls or even pcre. I was toying with the 
> idea of enabling oracle to use PCRE but I gave up when I read that 10g 
> will have that included.
> 
> On 11/21/2003 11:59:31 AM, Guang Mei wrote:
> > Perl is a good tool for text processing. But our program is already written
> > in pl/sql long time ago and there are intensive db calls in this pl/sql
> > program. (text processing is only part of it). So I can not change that.
> > 
> > BTW I did a comparison study a while ago for some of our pl/sql packages
> > (specifically for our application). When there are lots of db calls (select,
> > insert, update and delete), pl/sql package is faster than correponding perl
> > program (I made sure sqls are prepared once and used bind variables in perl.
> > All code were executed on the unix server, no other programs were running,
> > etc). That's why we stick to pl/sql because our app need the performance.
> > Others may have different results, it all depends on what the code does.
> > 
> > Guang
> > 
> > -Original Message-
> > Mladen Gogala
> > Sent: Thursday, November 20, 2003 11:14 PM
> > To: Multiple recipients of list ORACLE-L
> > 
> > 
> > I don't know about PL/SQL but here is how I would get separate words from a
> > big string:
> > 
> > #!/usr/bin/perl -w
> > use strict;
> > my (@ARR);
> > while (<>) {
> > chomp;
> > @ARR = split(/[^0-9a-zA-Z_\.,<>]/);
> > foreach (@ARR) {
> > print "$_\n";
> > }
> > }
> > 
> > There is something called DBI and it can be used to insert separated words
> > into the database, instead
> > of printing them. The bottom line is that perl is an excellent tool for
> > parsing strings and  all sorts of string
> > manipulation.
> > 
> > On 2003.11.20 22:39, Guang Mei wrote:
> > > Hi:
> > >
> > > In my pl/sql program, I want to process each "word" in a string. The
> > > string is selected from a varchar2(300) column. The delimit that separates
> > > the words is not necessary space character. The definition of the delimit
> > > in this program is set as
> > >
> > > 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
> > > and
> > > 2. the character is not one of these:  '-.,/<*>_'
> > >
> > > Now my program is basically checking each character, find the delimit, and
> > > rebuild each word. After that I process each "word". The code looks like
> > > this:
> > >
> > > ---
> > > str :=  "This will be a long string with length upto 300 characters, it
> > > may contain some invisible characters';
> > > len := length(str)+1;
> > >   for i in 1..len loop
> > > ch := substr(str,i,1);
> > > if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1)  then
> > >   if word is not null then
> > > -- do some processing to variable word !
> > > word := null;-- reset it
> > >   end if;
> > > else
> > >   word := word || ch;   -- concat ch to word
> > > end if;
> > >   end loop;
> > >
> > > ---
> > >
> > > I think It's taking too long because it loops through each characters. I
> > > hope I could find a way to speed it up. I don't have experiience in
> > > owa_pattern, but I thought there might be a way to do it here:
> > >
> > > 
> > > str :=  "This will be a long string with length upto 300 characters, it
> > > may contain some invisible characters';
> > > newstr := str;
> > > pos := 1;
> > > while pos != 0 loop
> > > pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
> > > these  '-.,/<*>_'  ???
> > > word := substr(newstr, 1, pos-1);
> > > -- do some processing to variable word !
> > > if pos != 0 then
> > >   newstr := substr(newstr, pos+1);
> > > end if;
> > > end loop;
> > > --
> > >
> > > My simple tests showed that owa_pattern call is much slower than direct
> > > string manupil

Re: pl/sql question and owa_pattern question

2003-11-21 Thread Mladen Gogala
PL/SQL is the fastest thing of them all when it comes to executing 
SQL commands, but there are things which simply aren't practical 
in 9.2 PL/SQL. Regular expression processing is one of those things.
Fortunately, you can mix the two. Without DBI, perl scripts simply
woudn't be very useful. Of course, there are things that are faster
then even the fastest perl script. Lexer written in C is one of them
and you don't need much work to write one, either, but using OCI is
not easy. OCI is a library written to confuse the enemy, not to help
developer. Using plain and simple regex or PCRE within a C program
is the same thing as above, but slightly more complicated then a lexer.
For the specific task of manipulating patterns and resolving regular
expressions, I use perl almost exclusively because I find it an optimal 
tradeoff between ease of use and performance. If performance is a 
paramount, as in real time application processing, then you'll have to 
resort to C and, possibly, write an external procedure and, thus,
enabling oracle to use C regex calls or even pcre. I was toying with the 
idea of enabling oracle to use PCRE but I gave up when I read that 10g 
will have that included.

On 11/21/2003 11:59:31 AM, Guang Mei wrote:
> Perl is a good tool for text processing. But our program is already written
> in pl/sql long time ago and there are intensive db calls in this pl/sql
> program. (text processing is only part of it). So I can not change that.
> 
> BTW I did a comparison study a while ago for some of our pl/sql packages
> (specifically for our application). When there are lots of db calls (select,
> insert, update and delete), pl/sql package is faster than correponding perl
> program (I made sure sqls are prepared once and used bind variables in perl.
> All code were executed on the unix server, no other programs were running,
> etc). That's why we stick to pl/sql because our app need the performance.
> Others may have different results, it all depends on what the code does.
> 
> Guang
> 
> -Original Message-
> Mladen Gogala
> Sent: Thursday, November 20, 2003 11:14 PM
> To: Multiple recipients of list ORACLE-L
> 
> 
> I don't know about PL/SQL but here is how I would get separate words from a
> big string:
> 
> #!/usr/bin/perl -w
> use strict;
> my (@ARR);
> while (<>) {
> chomp;
> @ARR = split(/[^0-9a-zA-Z_\.,<>]/);
> foreach (@ARR) {
> print "$_\n";
> }
> }
> 
> There is something called DBI and it can be used to insert separated words
> into the database, instead
> of printing them. The bottom line is that perl is an excellent tool for
> parsing strings and  all sorts of string
> manipulation.
> 
> On 2003.11.20 22:39, Guang Mei wrote:
> > Hi:
> >
> > In my pl/sql program, I want to process each "word" in a string. The
> > string is selected from a varchar2(300) column. The delimit that separates
> > the words is not necessary space character. The definition of the delimit
> > in this program is set as
> >
> > 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
> > and
> > 2. the character is not one of these:  '-.,/<*>_'
> >
> > Now my program is basically checking each character, find the delimit, and
> > rebuild each word. After that I process each "word". The code looks like
> > this:
> >
> > ---
> > str :=  "This will be a long string with length upto 300 characters, it
> > may contain some invisible characters';
> > len := length(str)+1;
> >   for i in 1..len loop
> > ch := substr(str,i,1);
> > if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1)  then
> >   if word is not null then
> > -- do some processing to variable word !
> > word := null;-- reset it
> >   end if;
> > else
> >   word := word || ch;   -- concat ch to word
> > end if;
> >   end loop;
> >
> > ---
> >
> > I think It's taking too long because it loops through each characters. I
> > hope I could find a way to speed it up. I don't have experiience in
> > owa_pattern, but I thought there might be a way to do it here:
> >
> > 
> > str :=  "This will be a long string with length upto 300 characters, it
> > may contain some invisible characters';
> > newstr := str;
> > pos := 1;
> > while pos != 0 loop
> > pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
> > these  '-.,/<*>_'  ???
> > word := substr(newstr, 1, pos-1);
> > -- do some processing to variable word !
> > if pos != 0 then
> >   newstr := substr(newstr, pos+1);
> > end if;
> > end loop;
> > --
> >
> > My simple tests showed that owa_pattern call is much slower than direct
> > string manupilation. But I would like to try it in this case if I could
> > easily get the "wrods" from the string. Any suggestions?
> >
> > TIA.
> >
> > Guang
> >
> >
> > --
> > Please see the official ORACLE-L FAQ: http://www.orafaq.net
> > --
> > Author: Guang Mei
> >   INET: [EMAIL PROTECTED]
> >
> > Fat City Network Services-- 858-538-5051 ht

RE: pl/sql question and owa_pattern question

2003-11-21 Thread Jamadagni, Rajendra
Guang,

Well you are almost there ...  you need fifo structure  namely a pl/sql array

1. create a local pl/sql array to store the delimiter (store the ascii value of the 
delimiter to be safe) my_array (varchar2(5))
2. as you find a delimiter insert into the first position in the array and replace the 
delimiting character with #
3. lather.rinse.repeat.

when it is time to put it back
use a loop

nIndex := 0;
nPos   := 0;
loop
  npos := instr(my_str,'#',1);
  exit when npos := 0;
  nIndex := nindex + 1;
  my_str := substr(my_str,1,nPos-1) || chr(my_array(nIndex)) || sybstr(my_str, nPos+1);
end loop;


something like this should help, proof-read though ...

Raj

Rajendra dot Jamadagni at nospamespn dot com
All Views expressed in this email are strictly personal.
QOTD: Any clod can have facts, having an opinion is an art !


-Original Message-
Sent: Friday, November 21, 2003 11:44 AM
To: Multiple recipients of list ORACLE-L


Hi Stephane:

Thanks for your good suggestion. I compared the method you suggested and the orginal 
one and it indeed boosted the performance (in my simple test). However the ONLY 
problem I am having is that by doing TRANSLATE, I lost the original delimits. The new 
method (you suggested) correctly "extract" the "words" (and sent for processing), But 
after processing I need to put processed-words back to the orginal string with orginal 
demilters un-changed. I tried to track to position of delimit from the orginal string 
by doing

global_pos := global_pos + pos ;

in my "while" loop, but  ltrim(substr(string, pos + 1), '#')  will make "global_pos" 
wrong when ltrim trims '#'. Any work-around?

TIA.

Guang

-Original Message-
Stephane Faroult
Sent: Friday, November 21, 2003 4:19 AM
To: Multiple recipients of list ORACLE-L


Guang,

   I agree with your analysis, looping on characters is not the faster you can do, 
simply because there is a significant overhead (compared to C code for instance) in a 
language such as PL/SQL - which might be perfectly acceptable in some circumstances, 
much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, 
might improve performance. However, in my view the best performance gains you may get 
is by, so to speak, pushing the bulk of the processing deeper into the kernel (which 
isn't by the way exclusive of native compiling). Using a function such as INSTR() will 
be much more efficient than looping on characters.
 I would suggest something such as :
   - First use TRANSLATE() to replace all the characters you want to get rid of by a 
single, well identified character, say # (use CHR() || ... for non printable 
characters - you can build up the string of characters to translate in the 
initialisation section of a package rather than typing it).
   - Start with initializing your string to LTRIM(string, '#')
   - Then as long as pos := INSTR(string, '#') isn't 0,
 get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos 
+ 1), '#') to string (very similar to what you were planning to do with owa).

This will be probably much faster than a character-by-character loop and calls to an 
owa package.

HTH,

Stephane Faroult

>- --- Original Message --- -
>From: Guang Mei <[EMAIL PROTECTED]>
>To: Multiple recipients of list ORACLE-L
><[EMAIL PROTECTED]>
>Sent: Thu, 20 Nov 2003 19:39:55
>
>Hi:
>
>In my pl/sql program, I want to process each "word"
>in a string. The
>string is selected from a varchar2(300) column. The
>delimit that separates
>the words is not necessary space character. The
>definition of the delimit
>in this program is set as
>
>1. Any character that is NOT AlphaNumerical (0-9,
>A-Z,a-z)
>and
>2. the character is not one of these:  '-.,/<*>_'
>
>Now my program is basically checking each
>character, find the delimit, and
>rebuild each word. After that I process each
>"word". The code looks like
>this:
>
>---
>str :=  "This will be a long string with length
>upto 300 characters, it
>may contain some invisible characters';
>len := length(str)+1;
>  for i in 1..len loop
>ch := substr(str,i,1);
>if (not strings.isAlnum(ch) and
>instr('-.,/<*>_', ch)<1)  then
>  if word is not null then
>-- do some processing to variable word !
>word := null;-- reset it
>  end if;
>else
>  word := word || ch;   -- concat ch to word
>end if;
>  end loop;
>
>---
>
>I think It's taking too long because it loops
>through each characters. I
>hope I could find a way to speed it up. I don't
>have experiience in
>owa_pattern, but I thought there might be a way to
>do it here:
>
>
>str :=  "This will be a long string with length
>upto 300 characters, it
>may contain some invisible characters';
>newstr := str;
>pos := 1;
>while pos != 0 loop
>pos := owa_pattern.amatch(newstr, 1, '\W');   
>-- how can I mask out
>these  '-.,/<*>_'  ???
>word :

RE: pl/sql question and owa_pattern question

2003-11-21 Thread Guang Mei
Perl is a good tool for text processing. But our program is already written
in pl/sql long time ago and there are intensive db calls in this pl/sql
program. (text processing is only part of it). So I can not change that.

BTW I did a comparison study a while ago for some of our pl/sql packages
(specifically for our application). When there are lots of db calls (select,
insert, update and delete), pl/sql package is faster than correponding perl
program (I made sure sqls are prepared once and used bind variables in perl.
All code were executed on the unix server, no other programs were running,
etc). That's why we stick to pl/sql because our app need the performance.
Others may have different results, it all depends on what the code does.

Guang

-Original Message-
Mladen Gogala
Sent: Thursday, November 20, 2003 11:14 PM
To: Multiple recipients of list ORACLE-L


I don't know about PL/SQL but here is how I would get separate words from a
big string:

#!/usr/bin/perl -w
use strict;
my (@ARR);
while (<>) {
chomp;
@ARR = split(/[^0-9a-zA-Z_\.,<>]/);
foreach (@ARR) {
print "$_\n";
}
}

There is something called DBI and it can be used to insert separated words
into the database, instead
of printing them. The bottom line is that perl is an excellent tool for
parsing strings and  all sorts of string
manipulation.

On 2003.11.20 22:39, Guang Mei wrote:
> Hi:
>
> In my pl/sql program, I want to process each "word" in a string. The
> string is selected from a varchar2(300) column. The delimit that separates
> the words is not necessary space character. The definition of the delimit
> in this program is set as
>
> 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
> and
> 2. the character is not one of these:  '-.,/<*>_'
>
> Now my program is basically checking each character, find the delimit, and
> rebuild each word. After that I process each "word". The code looks like
> this:
>
> ---
> str :=  "This will be a long string with length upto 300 characters, it
> may contain some invisible characters';
> len := length(str)+1;
>   for i in 1..len loop
> ch := substr(str,i,1);
> if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1)  then
>   if word is not null then
> -- do some processing to variable word !
> word := null;-- reset it
>   end if;
> else
>   word := word || ch;   -- concat ch to word
> end if;
>   end loop;
>
> ---
>
> I think It's taking too long because it loops through each characters. I
> hope I could find a way to speed it up. I don't have experiience in
> owa_pattern, but I thought there might be a way to do it here:
>
> 
> str :=  "This will be a long string with length upto 300 characters, it
> may contain some invisible characters';
> newstr := str;
> pos := 1;
> while pos != 0 loop
> pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
> these  '-.,/<*>_'  ???
> word := substr(newstr, 1, pos-1);
> -- do some processing to variable word !
> if pos != 0 then
>   newstr := substr(newstr, pos+1);
> end if;
> end loop;
> --
>
> My simple tests showed that owa_pattern call is much slower than direct
> string manupilation. But I would like to try it in this case if I could
> easily get the "wrods" from the string. Any suggestions?
>
> TIA.
>
> Guang
>
>
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.net
> --
> Author: Guang Mei
>   INET: [EMAIL PROTECTED]
>
> Fat City Network Services-- 858-538-5051 http://www.fatcity.com
> San Diego, California-- Mailing list and web hosting services
> -
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from).  You may
> also send the HELP command for other information (like subscribing).
>

--
Mladen Gogala
Oracle DBA
--
Please see the official ORACLE-L FAQ: http://www.orafaq.net
--
Author: Mladen Gogala
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Guang Mei
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-

RE: pl/sql question and owa_pattern question

2003-11-21 Thread Guang Mei
Hi Stephane:

Thanks for your good suggestion. I compared the method you suggested and the orginal 
one and it indeed boosted the performance (in my simple test). However the ONLY 
problem I am having is that by doing TRANSLATE, I lost the original delimits. The new 
method (you suggested) correctly "extract" the "words" (and sent for processing), But 
after processing I need to put processed-words back to the orginal string with orginal 
demilters un-changed. I tried to track to position of delimit from the orginal string 
by doing

global_pos := global_pos + pos ;

in my "while" loop, but  ltrim(substr(string, pos + 1), '#')  will make "global_pos" 
wrong when ltrim trims '#'. Any work-around?

TIA.

Guang

-Original Message-
Stephane Faroult
Sent: Friday, November 21, 2003 4:19 AM
To: Multiple recipients of list ORACLE-L


Guang,

   I agree with your analysis, looping on characters is not the faster you can do, 
simply because there is a significant overhead (compared to C code for instance) in a 
language such as PL/SQL - which might be perfectly acceptable in some circumstances, 
much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, 
might improve performance. However, in my view the best performance gains you may get 
is by, so to speak, pushing the bulk of the processing deeper into the kernel (which 
isn't by the way exclusive of native compiling). Using a function such as INSTR() will 
be much more efficient than looping on characters.
 I would suggest something such as :
   - First use TRANSLATE() to replace all the characters you want to get rid of by a 
single, well identified character, say # (use CHR() || ... for non printable 
characters - you can build up the string of characters to translate in the 
initialisation section of a package rather than typing it).
   - Start with initializing your string to LTRIM(string, '#')
   - Then as long as pos := INSTR(string, '#') isn't 0,
 get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos 
+ 1), '#') to string (very similar to what you were planning to do with owa).

This will be probably much faster than a character-by-character loop and calls to an 
owa package.

HTH,

Stephane Faroult

>- --- Original Message --- -
>From: Guang Mei <[EMAIL PROTECTED]>
>To: Multiple recipients of list ORACLE-L
><[EMAIL PROTECTED]>
>Sent: Thu, 20 Nov 2003 19:39:55
>
>Hi:
>
>In my pl/sql program, I want to process each "word"
>in a string. The
>string is selected from a varchar2(300) column. The
>delimit that separates
>the words is not necessary space character. The
>definition of the delimit
>in this program is set as
>
>1. Any character that is NOT AlphaNumerical (0-9,
>A-Z,a-z)
>and
>2. the character is not one of these:  '-.,/<*>_'
>
>Now my program is basically checking each
>character, find the delimit, and
>rebuild each word. After that I process each
>"word". The code looks like
>this:
>
>---
>str :=  "This will be a long string with length
>upto 300 characters, it
>may contain some invisible characters';
>len := length(str)+1;
>  for i in 1..len loop
>ch := substr(str,i,1);
>if (not strings.isAlnum(ch) and
>instr('-.,/<*>_', ch)<1)  then
>  if word is not null then
>-- do some processing to variable word !
>word := null;-- reset it
>  end if;
>else
>  word := word || ch;   -- concat ch to word
>end if;
>  end loop;
>
>---
>
>I think It's taking too long because it loops
>through each characters. I
>hope I could find a way to speed it up. I don't
>have experiience in
>owa_pattern, but I thought there might be a way to
>do it here:
>
>
>str :=  "This will be a long string with length
>upto 300 characters, it
>may contain some invisible characters';
>newstr := str;
>pos := 1;
>while pos != 0 loop
>pos := owa_pattern.amatch(newstr, 1, '\W');   
>-- how can I mask out
>these  '-.,/<*>_'  ???
>word := substr(newstr, 1, pos-1);
>-- do some processing to variable word !
>if pos != 0 then
>  newstr := substr(newstr, pos+1);
>end if;
>end loop;
>--
>
>My simple tests showed that owa_pattern call is
>much slower than direct
>string manupilation. But I would like to try it in
>this case if I could
>easily get the "wrods" from the string. Any
>suggestions?
>
>TIA.
>
>Guang
>
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Stephane Faroult
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information

RE: pl/sql question and owa_pattern question

2003-11-21 Thread Stephane Faroult
Guang,

   I agree with your analysis, looping on characters is not the faster you can do, 
simply because there is a significant overhead (compared to C code for instance) in a 
language such as PL/SQL - which might be perfectly acceptable in some circumstances, 
much less so in very repetitive tasks. 'Native compiling', ie turning PL/SQL in C, 
might improve performance. However, in my view the best performance gains you may get 
is by, so to speak, pushing the bulk of the processing deeper into the kernel (which 
isn't by the way exclusive of native compiling). Using a function such as INSTR() will 
be much more efficient than looping on characters.
 I would suggest something such as :
   - First use TRANSLATE() to replace all the characters you want to get rid of by a 
single, well identified character, say # (use CHR() || ... for non printable 
characters - you can build up the string of characters to translate in the 
initialisation section of a package rather than typing it).
   - Start with initializing your string to LTRIM(string, '#')
   - Then as long as pos := INSTR(string, '#') isn't 0,
 get your token as substr(string, 1, pos - 1) then assign ltrim(substr(string, pos 
+ 1), '#') to string (very similar to what you were planning to do with owa).

This will be probably much faster than a character-by-character loop and calls to an 
owa package.

HTH,

Stephane Faroult

>- --- Original Message --- -
>From: Guang Mei <[EMAIL PROTECTED]>
>To: Multiple recipients of list ORACLE-L
><[EMAIL PROTECTED]>
>Sent: Thu, 20 Nov 2003 19:39:55
>
>Hi:
>
>In my pl/sql program, I want to process each "word"
>in a string. The
>string is selected from a varchar2(300) column. The
>delimit that separates
>the words is not necessary space character. The
>definition of the delimit
>in this program is set as
>
>1. Any character that is NOT AlphaNumerical (0-9,
>A-Z,a-z)
>and
>2. the character is not one of these:  '-.,/<*>_'
>
>Now my program is basically checking each
>character, find the delimit, and
>rebuild each word. After that I process each
>"word". The code looks like
>this:
>
>---
>str :=  "This will be a long string with length
>upto 300 characters, it
>may contain some invisible characters';
>len := length(str)+1;
>  for i in 1..len loop
>ch := substr(str,i,1);
>if (not strings.isAlnum(ch) and
>instr('-.,/<*>_', ch)<1)  then
>  if word is not null then
>-- do some processing to variable word !
>word := null;-- reset it
>  end if;
>else
>  word := word || ch;   -- concat ch to word
>end if;
>  end loop;
>
>---
>
>I think It's taking too long because it loops
>through each characters. I
>hope I could find a way to speed it up. I don't
>have experiience in
>owa_pattern, but I thought there might be a way to
>do it here:
>
>
>str :=  "This will be a long string with length
>upto 300 characters, it
>may contain some invisible characters';
>newstr := str;
>pos := 1;
>while pos != 0 loop
>pos := owa_pattern.amatch(newstr, 1, '\W');   
>-- how can I mask out
>these  '-.,/<*>_'  ???
>word := substr(newstr, 1, pos-1);
>-- do some processing to variable word !
>if pos != 0 then
>  newstr := substr(newstr, pos+1);
>end if;
>end loop;
>--
>
>My simple tests showed that owa_pattern call is
>much slower than direct
>string manupilation. But I would like to try it in
>this case if I could
>easily get the "wrods" from the string. Any
>suggestions?
>
>TIA.
>
>Guang
>
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Stephane Faroult
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


Re: pl/sql question and owa_pattern question

2003-11-20 Thread Mladen Gogala
I don't know about PL/SQL but here is how I would get separate words from a big string:

#!/usr/bin/perl -w
use strict;
my (@ARR);
while (<>) {
chomp;
@ARR = split(/[^0-9a-zA-Z_\.,<>]/);
foreach (@ARR) {
print "$_\n";
}
}

There is something called DBI and it can be used to insert separated words into the 
database, instead
of printing them. The bottom line is that perl is an excellent tool for parsing 
strings and  all sorts of string
manipulation.

On 2003.11.20 22:39, Guang Mei wrote:
> Hi:
> 
> In my pl/sql program, I want to process each "word" in a string. The
> string is selected from a varchar2(300) column. The delimit that separates
> the words is not necessary space character. The definition of the delimit
> in this program is set as
> 
> 1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
> and
> 2. the character is not one of these:  '-.,/<*>_'
> 
> Now my program is basically checking each character, find the delimit, and
> rebuild each word. After that I process each "word". The code looks like
> this:
> 
> ---
> str :=  "This will be a long string with length upto 300 characters, it
> may contain some invisible characters';
> len := length(str)+1;
>   for i in 1..len loop
> ch := substr(str,i,1);
> if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1)  then
>   if word is not null then
> -- do some processing to variable word !
> word := null;-- reset it
>   end if;
> else
>   word := word || ch;   -- concat ch to word
> end if;
>   end loop;
> 
> ---
> 
> I think It's taking too long because it loops through each characters. I
> hope I could find a way to speed it up. I don't have experiience in
> owa_pattern, but I thought there might be a way to do it here:
> 
> 
> str :=  "This will be a long string with length upto 300 characters, it
> may contain some invisible characters';
> newstr := str;
> pos := 1;
> while pos != 0 loop
> pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
> these  '-.,/<*>_'  ???
> word := substr(newstr, 1, pos-1);
> -- do some processing to variable word !
> if pos != 0 then
>   newstr := substr(newstr, pos+1);
> end if;
> end loop;
> --
> 
> My simple tests showed that owa_pattern call is much slower than direct
> string manupilation. But I would like to try it in this case if I could
> easily get the "wrods" from the string. Any suggestions?
> 
> TIA.
> 
> Guang
> 
> 
> -- 
> Please see the official ORACLE-L FAQ: http://www.orafaq.net
> -- 
> Author: Guang Mei
>   INET: [EMAIL PROTECTED]
> 
> Fat City Network Services-- 858-538-5051 http://www.fatcity.com
> San Diego, California-- Mailing list and web hosting services
> -
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from).  You may
> also send the HELP command for other information (like subscribing).
> 

-- 
Mladen Gogala
Oracle DBA
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Mladen Gogala
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


pl/sql question and owa_pattern question

2003-11-20 Thread Guang Mei
Hi:

In my pl/sql program, I want to process each "word" in a string. The
string is selected from a varchar2(300) column. The delimit that separates
the words is not necessary space character. The definition of the delimit
in this program is set as

1. Any character that is NOT AlphaNumerical (0-9, A-Z,a-z)
and
2. the character is not one of these:  '-.,/<*>_'

Now my program is basically checking each character, find the delimit, and
rebuild each word. After that I process each "word". The code looks like
this:

---
str :=  "This will be a long string with length upto 300 characters, it
may contain some invisible characters';
len := length(str)+1;
  for i in 1..len loop
ch := substr(str,i,1);
if (not strings.isAlnum(ch) and instr('-.,/<*>_', ch)<1)  then
  if word is not null then
-- do some processing to variable word !
word := null;-- reset it
  end if;
else
  word := word || ch;   -- concat ch to word
end if;
  end loop;

---

I think It's taking too long because it loops through each characters. I
hope I could find a way to speed it up. I don't have experiience in
owa_pattern, but I thought there might be a way to do it here:


str :=  "This will be a long string with length upto 300 characters, it
may contain some invisible characters';
newstr := str;
pos := 1;
while pos != 0 loop
pos := owa_pattern.amatch(newstr, 1, '\W');-- how can I mask out
these  '-.,/<*>_'  ???
word := substr(newstr, 1, pos-1);
-- do some processing to variable word !
if pos != 0 then
  newstr := substr(newstr, pos+1);
end if;
end loop;
--

My simple tests showed that owa_pattern call is much slower than direct
string manupilation. But I would like to try it in this case if I could
easily get the "wrods" from the string. Any suggestions?

TIA.

Guang


-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Guang Mei
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).