Re: Once upon a time?
Thanks guys! Now I wonder which will be fastest: get wordOffset(word1, tText) if (it 0) AND (it = (wordOffset(word2, tText) - 1)) then or put space into P put replaceText(once upon a time,\s+,P) into cleanVar put Ponce uponP is in PcleanVarP = true ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Once upon a time?
The first (wordOffset), by a factor of 100 or so, on my machine. Best, Mark On 3 Feb 2007, at 13:24, David Bovill wrote: Thanks guys! Now I wonder which will be fastest: get wordOffset(word1, tText) if (it 0) AND (it = (wordOffset(word2, tText) - 1)) then or put space into P put replaceText(once upon a time,\s+,P) into cleanVar put Ponce uponP is in PcleanVarP = true ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Once upon a time?
That is a big difference! ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Once upon a time?
On 2/3/07 5:24 AM, David Bovill [EMAIL PROTECTED] wrote: Now I wonder which will be fastest: get wordOffset(word1, tText) if (it 0) AND (it = (wordOffset(word2, tText) - 1)) then or put space into P put replaceText(once upon a time,\s+,P) into cleanVar put Ponce uponP is in PcleanVarP = true RegEx is a slower technique almost every time. The larger the text block, the more hits, and the larger the number text blocks all add to the demand. Regex is an engine that actually scans back and forth through a text block and follows rules. The simpler the rules you give it, the shorter the processing time. Using Rev's chunking ability will always about 10-100 times faster. However, a field of 100 lines will not be noticeable. I use some heavy regEx to parse web pages everyday, every minute because I need pin-point accuracy and data mining vs fasted execution. Lots of rules, lots of steps. Chunking just won't do it without a lot of 'IF' statements. In Rev, this is actually very fast. You can extract only the words on the lines where they live by: repeat for each line LNN in textBlock repeat for each word WRD in LNN put WRD space after newTextBlock end repeat delete last char of newTextBlock put cr after newTextBlock end repeat delete last char of newTextBlock Of course this example strips the punctuation and tabs Jim Ault Las Vegas ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Once upon a time?
put once upon is among the words of once upon a time = false (which surprised me!) put once,upon is among the items of once,upon,a,time = true set the wholeMatches to true set the itemDelimiter to space put once upon is among the items of once upon a time = true but with 2 spaces: set the wholeMatches to true set the itemDelimiter to space put once upon is among the items of once upon a time = false Is there a simple way to avoid regular expressions here? The aim is to be able to detect the exact two words together, but tolerant of variable white space. Something like this works but is ugly: put (?mi)\W*once\s+upon\s+.* into regularExpression put matchChunk( once upon a time, regularExpression) ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Once upon a time?
If you need exactly two words, how about something like this: set the wholeMatches to true get wordOffset(word1, tText) if (it 0) AND (it = (wordOffset(word2, tText) - 1)) then ... end if It seems wordOffset() will ignore multiple spaces for you. HTH, Brian ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Once upon a time?
put once upon is among the words of once upon a time = false (which surprised me!) This shouldn't surprise you if you consider: put apple,grape is among the items of apple,grape,pear = false --since apple,grape is not an item, it is two of them --just as once upon is not a word If you want to detect the string once upon then put space into s put (sonce upons) is in (sonce upon a times) --now the string can be located anywhere in the target phrase, beginning or end, and return true. No regex required. Watch out for non white space delims such as , or . get once upon a time, the sky was blue replace , with space put space into s put (sonce upons) is in (sonce upon a times) Hope this helps Jim Ault Las Vegas On 2/2/07 4:55 AM, David Bovill [EMAIL PROTECTED] wrote: put once,upon is among the items of once,upon,a,time = true set the wholeMatches to true set the itemDelimiter to space put once upon is among the items of once upon a time = true but with 2 spaces: set the wholeMatches to true set the itemDelimiter to space put once upon is among the items of once upon a time = false Is there a simple way to avoid regular expressions here? The aim is to be able to detect the exact two words together, but tolerant of variable white space. Something like this works but is ugly: put (?mi)\W*once\s+upon\s+.* into regularExpression put matchChunk( once upon a time, regularExpression) ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Once upon a time?
I apologize for my first reply. A phone call came in before I wrote the reply, then I did it from my recollection of the problem, and did not re-read the email carefully. Disregard it because it does not address the problem (yes it is a hectic Friday) On 2/2/07 4:55 AM, David Bovill [EMAIL PROTECTED] wrote: put once upon is among the words of once upon a time = false (which surprised me!) put once,upon is among the items of once,upon,a,time = true set the wholeMatches to true set the itemDelimiter to space put once upon is among the items of once upon a time = true but with 2 spaces: set the wholeMatches to true set the itemDelimiter to space put once upon is among the items of once upon a time = false Is there a simple way to avoid regular expressions here? The aim is to be able to detect the exact two words together, but tolerant of variable white space. Something like this works but is ugly: put (?mi)\W*once\s+upon\s+.* into regularExpression put matchChunk( once upon a time, regularExpression) I think the key limitation you put forth is 'white space' and you could convert all white space runs to one space char put space into P put replaceText(once upon a time,\s+,P) into cleanVar put Ponce uponP is in PcleanVarP = true --The reason for using a pad char is that you will find that put once upon is in put the wall sconce upon the workbench This will avoid having to construct ugly regex that depends on the phrase you are testing. Hopefully this answer will address you actual question. Jim Ault Las Vegas ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution