Re: Once upon a time?

2007-02-03 Thread David Bovill

Thanks guys!

Now I wonder which will be fastest:

get wordOffset(word1, tText)
if (it  0) AND (it = (wordOffset(word2, tText) - 1)) then

or


put space into P
put replaceText(once   upon a  time,\s+,P)  into cleanVar
put Ponce uponP is in PcleanVarP = true

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Once upon a time?

2007-02-03 Thread Mark Smith

The first (wordOffset), by a factor of 100 or so, on my machine.

Best,

Mark

On 3 Feb 2007, at 13:24, David Bovill wrote:


Thanks guys!

Now I wonder which will be fastest:

get wordOffset(word1, tText)
if (it  0) AND (it = (wordOffset(word2, tText) - 1)) then

or


put space into P
put replaceText(once   upon a  time,\s+,P)  into cleanVar
put Ponce uponP is in PcleanVarP = true

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Once upon a time?

2007-02-03 Thread David Bovill

That is a big difference!
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Once upon a time?

2007-02-03 Thread Jim Ault
On 2/3/07 5:24 AM, David Bovill [EMAIL PROTECTED] wrote:
Now I wonder which will be fastest:
get wordOffset(word1, tText)
if (it  0) AND (it = (wordOffset(word2, tText) - 1)) then
or

 put space into P
 put replaceText(once   upon a  time,\s+,P)  into cleanVar
 put Ponce uponP is in PcleanVarP = true

RegEx is a slower technique almost every time.
The larger the text block, the more hits, and the larger the number text
blocks  all add to the demand.

Regex is an engine that actually scans back and forth through a text block
and follows rules.  The simpler the rules you give it, the shorter the
processing time.

Using Rev's chunking ability will always about 10-100 times faster.
However, a field of 100 lines will not be noticeable.  I use some heavy
regEx to parse web pages everyday, every minute because I need pin-point
accuracy and data mining vs fasted execution.  Lots of rules, lots of steps.
Chunking just won't do it without a lot of 'IF' statements.

In Rev, this is actually very fast.  You can extract only the words on the
lines where they live by:

repeat for each line LNN in textBlock
   repeat for each word WRD in LNN
  put WRD  space after newTextBlock
   end repeat
   delete last char of newTextBlock
   put cr after newTextBlock
end repeat
delete last char of newTextBlock

Of course this example strips the punctuation and tabs

Jim Ault
Las Vegas


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Once upon a time?

2007-02-02 Thread David Bovill

put once upon is among the words of once upon a time = false (which
surprised me!)

put once,upon is among the items of once,upon,a,time = true

set the wholeMatches to true
set the itemDelimiter to space
put once upon is among the items of once upon a time = true

but with 2 spaces:

set the wholeMatches to true
set the itemDelimiter to space
put once upon is among the items of once  upon a time = false

Is there a simple way to avoid regular expressions here? The aim is to be
able to detect the exact two words together, but tolerant of variable white
space.

Something like this works but is ugly:

put (?mi)\W*once\s+upon\s+.* into regularExpression
put matchChunk( once  upon  a time, regularExpression)
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Once upon a time?

2007-02-02 Thread Brian Yennie

If you need exactly two words, how about something like this:

set the wholeMatches to true
get wordOffset(word1, tText)
if (it  0) AND (it = (wordOffset(word2, tText) - 1)) then
  ...
end if

It seems wordOffset() will ignore multiple spaces for you.

HTH,
Brian

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Once upon a time?

2007-02-02 Thread Jim Ault
 put once upon is among the words of once upon a time = false (which
 surprised me!)
This shouldn't surprise you if you consider:
put apple,grape is among the items of apple,grape,pear = false
--since apple,grape is not an item, it is two of them
--just as once upon is not a word

If you want to detect the string once upon then
put space into s
put (sonce upons) is in (sonce upon a times)

--now the string can be located anywhere in the target phrase, beginning or
end, and return true.  No regex required.
Watch out for non white space delims such as , or .

get once upon a time, the sky was blue
replace , with space
put space into s
put (sonce upons) is in (sonce upon a times)


Hope this helps
Jim Ault
Las Vegas


On 2/2/07 4:55 AM, David Bovill [EMAIL PROTECTED] wrote:
 put once,upon is among the items of once,upon,a,time = true
 
 set the wholeMatches to true
 set the itemDelimiter to space
 put once upon is among the items of once upon a time = true
 
 but with 2 spaces:
 
 set the wholeMatches to true
 set the itemDelimiter to space
 put once upon is among the items of once  upon a time = false
 
 Is there a simple way to avoid regular expressions here? The aim is to be
 able to detect the exact two words together, but tolerant of variable white
 space.
 
 Something like this works but is ugly:
 
 put (?mi)\W*once\s+upon\s+.* into regularExpression
 put matchChunk( once  upon  a time, regularExpression)
 ___
 use-revolution mailing list
 use-revolution@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Once upon a time?

2007-02-02 Thread Jim Ault
I apologize for my first reply.
A phone call came in before I wrote the reply, then I did it from my
recollection of the problem, and did not re-read the email carefully.
Disregard it because it does not address the problem (yes it is a hectic
Friday)


On 2/2/07 4:55 AM, David Bovill [EMAIL PROTECTED] wrote:
 put once upon is among the words of once upon a time = false (which
 surprised me!)
 
 put once,upon is among the items of once,upon,a,time = true
 
 set the wholeMatches to true
 set the itemDelimiter to space
 put once upon is among the items of once upon a time = true
 
 but with 2 spaces:
 
 set the wholeMatches to true
 set the itemDelimiter to space
 put once upon is among the items of once  upon a time = false
 
 Is there a simple way to avoid regular expressions here? The aim is to be
 able to detect the exact two words together, but tolerant of variable white
 space.
 
 Something like this works but is ugly:
 
 put (?mi)\W*once\s+upon\s+.* into regularExpression
 put matchChunk( once  upon  a time, regularExpression)

I think the key limitation you put forth is 'white space' and you could
convert all white space runs to one space char

put space into P
put replaceText(once   upon a  time,\s+,P)  into cleanVar
put Ponce uponP is in PcleanVarP = true

--The reason for using a pad char is that you will find that
put once upon is in put the wall sconce upon the workbench

This will avoid having to construct ugly regex that depends on the phrase
you are testing.

Hopefully this answer will address you actual question.

Jim Ault
Las Vegas


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution