Matchtext for multiple words

2006-11-29 Thread J. Landman Gay

Sorry if this comes through twice, I'm having trouble sending to the list.

I need a matchtext/regex that will tell me if all supplied words exist 
in a block of text, regardless of their order, and ignoring carriage 
returns.


For example, see if all these words:  dog dinosaur cat

exist in this text:

The purple dinosaur inadvertently stepped on the cat.cr
The white dog howled.

Should return true. Is there such a thing?

--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread Mark Smith
Do you really need to do it with MatchText? Aren't is in, is among  
the words of etc going to work? Or do you really need it to be a one- 
liner?


Best,

Mark

ps. That's the third one ;-0

On 29 Nov 2006, at 21:39, J. Landman Gay wrote:

Sorry if this comes through twice, I'm having trouble sending to  
the list.


I need a matchtext/regex that will tell me if all supplied words  
exist in a block of text, regardless of their order, and ignoring  
carriage returns.


For example, see if all these words:  dog dinosaur cat

exist in this text:

The purple dinosaur inadvertently stepped on the cat.cr
The white dog howled.

Should return true. Is there such a thing?

--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread Eric Chatonet

Hi Jacque,

Are you sure you need a regex? ;-)

function AreWordsIn pText,pWords
  repeat for each word tWord in pWords
if space  tWord  space is not in pText then return false
  end repeat
  return true
end AreWordsIn

As this way of doing searches for words that are not in the text, it  
should be very fast...


Le 29 nov. 06 à 22:39, J. Landman Gay a écrit :

Sorry if this comes through twice, I'm having trouble sending to  
the list.


I need a matchtext/regex that will tell me if all supplied words  
exist in a block of text, regardless of their order, and ignoring  
carriage returns.


For example, see if all these words:  dog dinosaur cat

exist in this text:

The purple dinosaur inadvertently stepped on the cat.cr
The white dog howled.

Should return true. Is there such a thing?



Best Regards from Paris,
Eric Chatonet
 
--

http://www.sosmartsoftware.com/[EMAIL PROTECTED]/


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread Brian Yennie

Jacque,

I think the in any order part will make a single RegEx a nightmare  
(although it's probably technically possible).
How about using something simple like (or scroll down for a non-RegEx  
idea)


.*(dinosaur|dog|cat).*

Then capture the actual text matched, and remove that from the  
expression. So in your example, you would first match dinosaur.  
Then you would run the RegEx again as:


.*(dog|cat).*

Which would match cat.

Then finally:

.*dog.*

If you're not married to RegEx, you could just do something like  
this. It should be pretty speedy, as it uses array lookups, simple  
comparisons, and only one pass through your text.


## put the words into an array for quick lookup

repeat for each word w in wordList
   put 0 into myWords[w]
end repeat

## loop through your text and mark all of the words you find

repeat for each word w in myText
  if (myWords[w] = 0) then
put 1 into myWords[w]
  end if
end repeat

## check that all of your words were marked with a 1

put TRUE into foundThemAll
repeat for each word w in wordList
   if (myWords[w]  1) then
 put FALSE into foundThemAll
 exit repeat
  end if
end repeat



Sorry if this comes through twice, I'm having trouble sending to  
the list.


I need a matchtext/regex that will tell me if all supplied words  
exist in a block of text, regardless of their order, and ignoring  
carriage returns.


For example, see if all these words:  dog dinosaur cat

exist in this text:

The purple dinosaur inadvertently stepped on the cat.cr
The white dog howled.

Should return true. Is there such a thing?

--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution




___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread J. Landman Gay

Mark Smith wrote:
Do you really need to do it with MatchText? Aren't is in, is among the 
words of etc going to work? Or do you really need it to be a one-liner?


Best,

Mark

ps. That's the third one ;-0


Yeah, I noticed that, and I'm not sure how it happened. I only sent one, 
then waited an hour or so. Then I changed the outgoing server I was 
using and sent again. Then three of them showed up. I didn't do it! ;)


Anyway, thanks to Ken, Eric, and yourself for the suggestions. I 
probably didn't explain enough. If I were only checking a single block 
of text then I'd use some of the built-in commands, but I have to loop 
through a couple of zillion blocks. So I figured matchtext would be 
faster if, hopefully, I could issue a single command for each lookup. If 
I have to do multiple lookups for each text block, then I end up with:


if dinosaur is in tText and dog is in tText and cat is in tText

and that would require 3 times the number of lookups over a single 
matchtext. Also, the number of words can vary so I'd have to construct a 
repeat loop to build the command itself, and use a do statement to 
execute it -- and both of those are slow. But if I'm wrong, I'd like to 
know. Has anyone done any speed tests on this stuff?


Basically I need the fastest possible way to scan a large number of text 
blocks for an indefinite number of words which occur in any portion of 
the text.


I'll try Ken's thing too -- thanks Ken.

(I'll send this once and cross my fingers.)
--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread John Craig
Although you can invert character matching using [^ ... , I don't think 
there's an equivalent for words.

You could have used;
(is)\b(cat|dinosaur|dog)\b.*\b_(cat|dinosaur|dog)\b

... if there was a way to say 'not beginning with the first match' where 
the underscore appears in the above - then
it would be possible to do a quick 1 liner regex - we can use '\1' to 
back reference the first match.


:-(

J. Landman Gay wrote:
Sorry if this comes through twice, I'm having trouble sending to the 
list.


I need a matchtext/regex that will tell me if all supplied words exist 
in a block of text, regardless of their order, and ignoring carriage 
returns.


For example, see if all these words:  dog dinosaur cat

exist in this text:

The purple dinosaur inadvertently stepped on the cat.cr
The white dog howled.

Should return true. Is there such a thing?



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread Ken Ray
On 11/29/06 5:07 PM, J. Landman Gay [EMAIL PROTECTED] wrote:


 if dinosaur is in tText and dog is in tText and cat is in tText
 
 and that would require 3 times the number of lookups over a single
 matchtext. 

Plus, it would match paragraphs with catastrophe, doggedly, muscat,
etc., which you may also not want.

 Also, the number of words can vary so I'd have to construct a
 repeat loop to build the command itself, and use a do statement to
 execute it -- and both of those are slow. But if I'm wrong, I'd like to
 know. Has anyone done any speed tests on this stuff?
 
 Basically I need the fastest possible way to scan a large number of text
 blocks for an indefinite number of words which occur in any portion of
 the text.
 
 I'll try Ken's thing too -- thanks Ken.

:-)


Ken Ray
Sons of Thunder Software, Inc.
Web site: http://www.sonsothunder.com/
Email: [EMAIL PROTECTED]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread Dick Kriesel
On 11/29/06 1:39 PM, J. Landman Gay [EMAIL PROTECTED] wrote:

 I need a matchtext/regex that will tell me if all supplied words exist
 in a block of text, regardless of their order, and ignoring carriage
 returns.
 
 For example, see if all these words:  dog dinosaur cat
 
 exist in this text:
 
 The purple dinosaur inadvertently stepped on the cat.cr
 The white dog howled.
 
 Should return true. Is there such a thing?

Since Rev says cat and cat. are different words, punctuation poses a
problem.  Here's an approach that's simple and fast but depends on the
programmer to include a replace statement for each punctuation mark.

-- Dick

on mouseUp
  put 
  put The purple dinosaur inadvertently stepped on the cat.  cr \
 The white dog howled. into tText
  put dog dinosaur cat into tWords
  putLines textContainsAllWords(tText,tWords)
end mouseUp

function textContainsAllWords tText,pWords
  replace . with space in tText
  replace , with space in tText
  repeat for each word tWord in tText
put 1 into tArray[tWord]
  end repeat
  repeat for each word tWord in pWords
if tArray[tWord] is empty then return false
  end repeat
  return true
end textContainsAllWords


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread John Craig

I still think it's working ok - someone slap me if I'm wrong.
The (?!  is looking ahead and saying 'you can't begin with.

(?!\1) - you can't begin with the first match
(?!\1|\2) - you can't begin with the 1st or second match

JC

J. Landman Gay wrote:
Sorry if this comes through twice, I'm having trouble sending to the 
list.


I need a matchtext/regex that will tell me if all supplied words exist 
in a block of text, regardless of their order, and ignoring carriage 
returns.


For example, see if all these words:  dog dinosaur cat

exist in this text:

The purple dinosaur inadvertently stepped on the cat.cr
The white dog howled.

Should return true. Is there such a thing?



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread J. Landman Gay

John Craig wrote:

  -- build the whole damn regex

LOL! I know exactly what you mean. I'll test this. I'm building a test 
suite of all the responses and will report here what I find. So far, I'm 
surprised at the results.


I'm kind of pleased with this whole thread. Scripting contests are cool. 
We should make it a monthly affair.



And a script to create the regex from a word list.  My apologies if this 
stuff turns out useless - but you can get absorbed in this mince...



on mouseUp
 -- string to search * SHOULD MATCH
 put The purple dinosaur inadvertently stepped on the cat.  return  
The white dog howled. into tString


 -- DUFF string to search ** SHOULD NOT MATCH
 put The purple dinosaur inadvertently stepped on the cat.  return  
The white dinosaur howled. into tString2


 -- words to find
 put cat,dinosaur,dog into tWords

 -- build the pattern to match the words
 put ( into tWordsPattern
 repeat for each item tWord in tWords
   put tWord  | after tWordsPattern
 end repeat
 put ) into char -1 of tWordsPattern

 -- build the whole damn regex
 put num of items in tWords into tTotalWords
 put 0 into tCurrentWord
 put (?is) into tRegex
 repeat for each item tWord in tWords
   add 1 to tCurrentWord
   put \b after tRegex
   if tCurrentWord  1 then
 put (?! after tRegex
 repeat with i = 1 to tCurrentWord - 1
   put \  i  | after tRegex
 end repeat
 delete char -1 of tRegex
 put ) after tRegex
   end if
   put tWordsPattern  \b after tRegex
   if tCurrentWord  tTotalWords then
 put .* after tRegex
   end if
 end repeat

 -- test our regex against the 2 test strings
 put matchText(tString, tRegex)  return  matchText(tString2, tRegex)

end mouseUp


John Craig wrote:

I still think it's working ok - someone slap me if I'm wrong.
The (?!  is looking ahead and saying 'you can't begin with.

(?!\1) - you can't begin with the first match
(?!\1|\2) - you can't begin with the 1st or second match

JC

J. Landman Gay wrote:
Sorry if this comes through twice, I'm having trouble sending to the 
list.


I need a matchtext/regex that will tell me if all supplied words 
exist in a block of text, regardless of their order, and ignoring 
carriage returns.


For example, see if all these words:  dog dinosaur cat

exist in this text:

The purple dinosaur inadvertently stepped on the cat.cr
The white dog howled.

Should return true. Is there such a thing?



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution




--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread J. Landman Gay

John Craig wrote:
And a script to create the regex from a word list.  My apologies if this 
stuff turns out useless - but you can get absorbed in this mince...


I passed three random words to your script (list,house,dog) and got this 
regex from it:


(?is)\b(list|house|dog)\b\b(?!\1)(list|house|dog)\b\b(?!\1|\2)(list|house|dog)\b

My test then goes through a bunch of text files on disk and applies the 
regex to the text of each file like this:


put matchText(tText, tRegex) into tMatch

I don't get any matches though, and my knowledge of regex is too limited 
for me to know if I'm doing something wrong. Does this look right to 
you? I think there should have been at least 2 matching files (that's 
what some of the other scripts produced.)


--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread John Craig

J. Landman Gay wrote:

John Craig wrote:
And a script to create the regex from a word list.  My apologies if 
this stuff turns out useless - but you can get absorbed in this mince...


I passed three random words to your script (list,house,dog) and got 
this regex from it:




Here is the correct regex I get when I substitute your new words into 
the script (check that list is passed as list|house|dog)

(?is)\b(list|house|dog)\b.*\b(?!\1)(list|house|dog)\b.*\b(?!\1|\2)(list|house|dog)\b

a few bits missing from the one below!
(?is)\b(list|house|dog)\b\b(?!\1)(list|house|dog)\b\b(?!\1|\2)(list|house|dog)\b 



My test then goes through a bunch of text files on disk and applies 
the regex to the text of each file like this:


put matchText(tText, tRegex) into tMatch

I don't get any matches though, and my knowledge of regex is too 
limited for me to know if I'm doing something wrong. Does this look 
right to you? I think there should have been at least 2 matching files 
(that's what some of the other scripts produced.)




___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext for multiple words

2006-11-29 Thread J. Landman Gay

John Craig wrote:

Oops.

I meant to say - check the list is passed as list,dog,house (comma 
separated, and without parenthesis)


Yeah, that was the problem. I was altering the scripts from the list so 
they would fit into my tests and I didn't change yours right. Now that 
I've made the correction it works fine. Thanks for the pointer, that was 
exactly what was wrong.


--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution