Matchtext for multiple words
Sorry if this comes through twice, I'm having trouble sending to the list. I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
Do you really need to do it with MatchText? Aren't is in, is among the words of etc going to work? Or do you really need it to be a one- liner? Best, Mark ps. That's the third one ;-0 On 29 Nov 2006, at 21:39, J. Landman Gay wrote: Sorry if this comes through twice, I'm having trouble sending to the list. I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
Hi Jacque, Are you sure you need a regex? ;-) function AreWordsIn pText,pWords repeat for each word tWord in pWords if space tWord space is not in pText then return false end repeat return true end AreWordsIn As this way of doing searches for words that are not in the text, it should be very fast... Le 29 nov. 06 à 22:39, J. Landman Gay a écrit : Sorry if this comes through twice, I'm having trouble sending to the list. I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? Best Regards from Paris, Eric Chatonet -- http://www.sosmartsoftware.com/[EMAIL PROTECTED]/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
Jacque, I think the in any order part will make a single RegEx a nightmare (although it's probably technically possible). How about using something simple like (or scroll down for a non-RegEx idea) .*(dinosaur|dog|cat).* Then capture the actual text matched, and remove that from the expression. So in your example, you would first match dinosaur. Then you would run the RegEx again as: .*(dog|cat).* Which would match cat. Then finally: .*dog.* If you're not married to RegEx, you could just do something like this. It should be pretty speedy, as it uses array lookups, simple comparisons, and only one pass through your text. ## put the words into an array for quick lookup repeat for each word w in wordList put 0 into myWords[w] end repeat ## loop through your text and mark all of the words you find repeat for each word w in myText if (myWords[w] = 0) then put 1 into myWords[w] end if end repeat ## check that all of your words were marked with a 1 put TRUE into foundThemAll repeat for each word w in wordList if (myWords[w] 1) then put FALSE into foundThemAll exit repeat end if end repeat Sorry if this comes through twice, I'm having trouble sending to the list. I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
Mark Smith wrote: Do you really need to do it with MatchText? Aren't is in, is among the words of etc going to work? Or do you really need it to be a one-liner? Best, Mark ps. That's the third one ;-0 Yeah, I noticed that, and I'm not sure how it happened. I only sent one, then waited an hour or so. Then I changed the outgoing server I was using and sent again. Then three of them showed up. I didn't do it! ;) Anyway, thanks to Ken, Eric, and yourself for the suggestions. I probably didn't explain enough. If I were only checking a single block of text then I'd use some of the built-in commands, but I have to loop through a couple of zillion blocks. So I figured matchtext would be faster if, hopefully, I could issue a single command for each lookup. If I have to do multiple lookups for each text block, then I end up with: if dinosaur is in tText and dog is in tText and cat is in tText and that would require 3 times the number of lookups over a single matchtext. Also, the number of words can vary so I'd have to construct a repeat loop to build the command itself, and use a do statement to execute it -- and both of those are slow. But if I'm wrong, I'd like to know. Has anyone done any speed tests on this stuff? Basically I need the fastest possible way to scan a large number of text blocks for an indefinite number of words which occur in any portion of the text. I'll try Ken's thing too -- thanks Ken. (I'll send this once and cross my fingers.) -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
Although you can invert character matching using [^ ... , I don't think there's an equivalent for words. You could have used; (is)\b(cat|dinosaur|dog)\b.*\b_(cat|dinosaur|dog)\b ... if there was a way to say 'not beginning with the first match' where the underscore appears in the above - then it would be possible to do a quick 1 liner regex - we can use '\1' to back reference the first match. :-( J. Landman Gay wrote: Sorry if this comes through twice, I'm having trouble sending to the list. I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
On 11/29/06 5:07 PM, J. Landman Gay [EMAIL PROTECTED] wrote: if dinosaur is in tText and dog is in tText and cat is in tText and that would require 3 times the number of lookups over a single matchtext. Plus, it would match paragraphs with catastrophe, doggedly, muscat, etc., which you may also not want. Also, the number of words can vary so I'd have to construct a repeat loop to build the command itself, and use a do statement to execute it -- and both of those are slow. But if I'm wrong, I'd like to know. Has anyone done any speed tests on this stuff? Basically I need the fastest possible way to scan a large number of text blocks for an indefinite number of words which occur in any portion of the text. I'll try Ken's thing too -- thanks Ken. :-) Ken Ray Sons of Thunder Software, Inc. Web site: http://www.sonsothunder.com/ Email: [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
On 11/29/06 1:39 PM, J. Landman Gay [EMAIL PROTECTED] wrote: I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? Since Rev says cat and cat. are different words, punctuation poses a problem. Here's an approach that's simple and fast but depends on the programmer to include a replace statement for each punctuation mark. -- Dick on mouseUp put put The purple dinosaur inadvertently stepped on the cat. cr \ The white dog howled. into tText put dog dinosaur cat into tWords putLines textContainsAllWords(tText,tWords) end mouseUp function textContainsAllWords tText,pWords replace . with space in tText replace , with space in tText repeat for each word tWord in tText put 1 into tArray[tWord] end repeat repeat for each word tWord in pWords if tArray[tWord] is empty then return false end repeat return true end textContainsAllWords ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
I still think it's working ok - someone slap me if I'm wrong. The (?! is looking ahead and saying 'you can't begin with. (?!\1) - you can't begin with the first match (?!\1|\2) - you can't begin with the 1st or second match JC J. Landman Gay wrote: Sorry if this comes through twice, I'm having trouble sending to the list. I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
John Craig wrote: -- build the whole damn regex LOL! I know exactly what you mean. I'll test this. I'm building a test suite of all the responses and will report here what I find. So far, I'm surprised at the results. I'm kind of pleased with this whole thread. Scripting contests are cool. We should make it a monthly affair. And a script to create the regex from a word list. My apologies if this stuff turns out useless - but you can get absorbed in this mince... on mouseUp -- string to search * SHOULD MATCH put The purple dinosaur inadvertently stepped on the cat. return The white dog howled. into tString -- DUFF string to search ** SHOULD NOT MATCH put The purple dinosaur inadvertently stepped on the cat. return The white dinosaur howled. into tString2 -- words to find put cat,dinosaur,dog into tWords -- build the pattern to match the words put ( into tWordsPattern repeat for each item tWord in tWords put tWord | after tWordsPattern end repeat put ) into char -1 of tWordsPattern -- build the whole damn regex put num of items in tWords into tTotalWords put 0 into tCurrentWord put (?is) into tRegex repeat for each item tWord in tWords add 1 to tCurrentWord put \b after tRegex if tCurrentWord 1 then put (?! after tRegex repeat with i = 1 to tCurrentWord - 1 put \ i | after tRegex end repeat delete char -1 of tRegex put ) after tRegex end if put tWordsPattern \b after tRegex if tCurrentWord tTotalWords then put .* after tRegex end if end repeat -- test our regex against the 2 test strings put matchText(tString, tRegex) return matchText(tString2, tRegex) end mouseUp John Craig wrote: I still think it's working ok - someone slap me if I'm wrong. The (?! is looking ahead and saying 'you can't begin with. (?!\1) - you can't begin with the first match (?!\1|\2) - you can't begin with the 1st or second match JC J. Landman Gay wrote: Sorry if this comes through twice, I'm having trouble sending to the list. I need a matchtext/regex that will tell me if all supplied words exist in a block of text, regardless of their order, and ignoring carriage returns. For example, see if all these words: dog dinosaur cat exist in this text: The purple dinosaur inadvertently stepped on the cat.cr The white dog howled. Should return true. Is there such a thing? ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
John Craig wrote: And a script to create the regex from a word list. My apologies if this stuff turns out useless - but you can get absorbed in this mince... I passed three random words to your script (list,house,dog) and got this regex from it: (?is)\b(list|house|dog)\b\b(?!\1)(list|house|dog)\b\b(?!\1|\2)(list|house|dog)\b My test then goes through a bunch of text files on disk and applies the regex to the text of each file like this: put matchText(tText, tRegex) into tMatch I don't get any matches though, and my knowledge of regex is too limited for me to know if I'm doing something wrong. Does this look right to you? I think there should have been at least 2 matching files (that's what some of the other scripts produced.) -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
J. Landman Gay wrote: John Craig wrote: And a script to create the regex from a word list. My apologies if this stuff turns out useless - but you can get absorbed in this mince... I passed three random words to your script (list,house,dog) and got this regex from it: Here is the correct regex I get when I substitute your new words into the script (check that list is passed as list|house|dog) (?is)\b(list|house|dog)\b.*\b(?!\1)(list|house|dog)\b.*\b(?!\1|\2)(list|house|dog)\b a few bits missing from the one below! (?is)\b(list|house|dog)\b\b(?!\1)(list|house|dog)\b\b(?!\1|\2)(list|house|dog)\b My test then goes through a bunch of text files on disk and applies the regex to the text of each file like this: put matchText(tText, tRegex) into tMatch I don't get any matches though, and my knowledge of regex is too limited for me to know if I'm doing something wrong. Does this look right to you? I think there should have been at least 2 matching files (that's what some of the other scripts produced.) ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext for multiple words
John Craig wrote: Oops. I meant to say - check the list is passed as list,dog,house (comma separated, and without parenthesis) Yeah, that was the problem. I was altering the scripts from the list so they would fit into my tests and I didn't change yours right. Now that I've made the correction it works fine. Thanks for the pointer, that was exactly what was wrong. -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution