Re: Matchtext to find a series of words
> I meant a single pass for each block. The filter solution has to make a > new pass through the text for each word we want to filter on. But > regardless, it still shows very well in my tests. Actually, the filter pass does not have to make a new pass through *ALL* of the text. Since I wrote it to work on the same variable successively, if the first filter command does not find a match, the next two work on an empty variable. Quite fast. If it does find matching lines, the successive commands work only on the hit text, thus optimizing by elimination. The last email I sent shows that if you make each block a single line by replacing the cr's, then concatenating the next block, you can make a single pass for each word for all blocks at ONCE. If there are no matches for the first word, then the following words are filtering an empty variable. By tagging each line with a header as you concatenate, you can even tell which lines (blocks) meet all the criteria without any speed difference since the residual variable will contain only hits. The slowest would obviously be the 'all three words found in all the blocks' scenario. Glad you are having fun Jim Ault Las Vegas On 11/29/06 6:02 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote: > Jim Ault wrote: >> On 11/29/06 3:37 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote: >>> This looks promising, thanks. It looks like there is no single-pass >>> method, but since filter is pretty fast it may do okay. I didn't even >>> quote your regex explanation, I don't want to touch it. :) >> >> You mention single pass... >> Question: Single pass of what? >> Single pass of each text block or all text blocks together? > ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
Jim Ault wrote: On 11/29/06 3:37 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote: This looks promising, thanks. It looks like there is no single-pass method, but since filter is pretty fast it may do okay. I didn't even quote your regex explanation, I don't want to touch it. :) You mention single pass... Question: Single pass of what? Single pass of each text block or all text blocks together? I meant a single pass for each block. The filter solution has to make a new pass through the text for each word we want to filter on. But regardless, it still shows very well in my tests. -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
On 30 Nov 2006, at 00:31, Brian Yennie wrote: You just need to pass through the text once, and "cross off" each word as you find it. If everything is crossed off when you're done, then you're done =). That's a much better idea than mine, so: function aMatch pWords,tText -- first remove punctuation marks from the word list, perhaps unneccessary repeat for each char C in pWords if C is cr OR charToNum(C) >= 65 then put C after tWords end repeat repeat for each word W in tText put empty into newWord repeat for each char C in W get charToNum(C) if it >= 65 AND it <= 122 then put C after newWord end repeat if newWord is among the lines of tWords then filter tWords without newWord end if if tWords is empty then return true end repeat return false end aMatch ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
On 11/29/06 4:31 PM, "Brian Yennie" <[EMAIL PROTECTED]> wrote: > I do think that algorithmically > one-pass is definitely possible. You just need to pass through the > text once, and "cross off" each word as you find it. If everything is > crossed off when you're done, then you're done =). Good idea, Brian. -- Dick on mouseUp put "The purple dinosaur inadvertently stepped on the cat." & cr \ & "The white dog howled." into tText put "dog dinosaur cat" into tWords put textContainsAllWords(tText,tWords) end mouseUp function textContainsAllWords tText,tWords replace "." with space in tText replace "," with space in tText split tText using space and space split tWords using space and space repeat for each key tWord in tText delete variable tWords[tWord] end repeat return the keys of tWords is empty end textContainsAllWords ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
On 11/29/06 3:37 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote: > This looks promising, thanks. It looks like there is no single-pass > method, but since filter is pretty fast it may do okay. I didn't even > quote your regex explanation, I don't want to touch it. :) You mention single pass... Question: Single pass of what? Single pass of each text block or all text blocks together? Doing all as one block --with tracing to know which are matches put 0 into cnt repeat for each line LNN in variableList add 1 to cnt do "get "&LNN replace cr with tab in it put cnt & LNN && it & cr after newBlock end repeat --now all the blocks are their own line in the aggregate put allWordsPresent(newBlock, wordList) into residualBlock if residualBlock is empty then put "no matches anywhere" else --word 1 of each line = -- (the variable number & variable name) --by concatenating it is unlikely they will form a match to one of your search words or tokens end if - >> function allWordsPresent textStr, wordList >> replace cr with tab in textStr >> set the wholematches to true >> repeat for each word WRD in wordList >> filter textStr with ("*" & WRD & "*") >> end repeat >> return not (textStr is empty) >> end allWordsPresent Jim Ault Las Vegas > Jim Ault wrote: > >> I would tackle this using the filter command >> >> replace cr with tab in textStr >> set the wholematches to true >> filter textStr with "*"& token1&"*" >> filter textStr with "*"& token2&"*" >> filter textStr with "*"& token3&"*" >> if textStr is empty then return false >> else return true >> >> A better form would be >> >> function allWordsPresent textStr, wordList >> replace cr with tab in textStr >> set the wholematches to true >> repeat for each word WRD in wordList >> filter textStr with ("*" & WRD & "*") >> end repeat >> return not (textStr is empty) >> end allWordsPresent > ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
This looks promising, thanks. It looks like there is no single-pass method, but since filter is pretty fast it may do okay. Not sure how robust my stab was, but I do think that algorithmically one-pass is definitely possible. You just need to pass through the text once, and "cross off" each word as you find it. If everything is crossed off when you're done, then you're done =). HTH - Brian ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
Jim Ault wrote: I would tackle this using the filter command replace cr with tab in textStr set the wholematches to true filter textStr with "*"& token1&"*" filter textStr with "*"& token2&"*" filter textStr with "*"& token3&"*" if textStr is empty then return false else return true A better form would be function allWordsPresent textStr, wordList replace cr with tab in textStr set the wholematches to true repeat for each word WRD in wordList filter textStr with ("*" & WRD & "*") end repeat return not (textStr is empty) end allWordsPresent This looks promising, thanks. It looks like there is no single-pass method, but since filter is pretty fast it may do okay. I didn't even quote your regex explanation, I don't want to touch it. :) -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
On 11/29/06 1:26 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote: > I need a matchtext/regex that will find a series of words in a block of > text, no matter whether they are together or not, and ignoring carriage > returns. For example: > > See if all of these words: dog cat dinosaur > > are in this text: > > "The purple dinosaur inadvertently stepped on the cat. > The white dog howled." > > Should return true. Is there such a thing? I would tackle this using the filter command replace cr with tab in textStr set the wholematches to true filter textStr with "*"& token1&"*" filter textStr with "*"& token2&"*" filter textStr with "*"& token3&"*" if textStr is empty then return false else return true A better form would be function allWordsPresent textStr, wordList replace cr with tab in textStr set the wholematches to true repeat for each word WRD in wordList filter textStr with ("*" & WRD & "*") end repeat return not (textStr is empty) end allWordsPresent regEx would be as follows the OR condition is \b(dog|cat|dinosaur)\b --where the \b says 'word boundary' to regEx the AND condition (?(?=condition)(then1|then2|then3)|(else1|else2|else3)) --major drawback is that you would have to structure the exact number of words to check [you used 3 in your example] and also be scanned multiple times 9starting with the hit fo 'dog') since you would be trying 4 combinations. RegEx would stop looking as soon as one of these tested TRUE. dog +positive lookbehind (?<=cat + positive lookbehind (?<=dinosaur) dog +positive lookahead (?<=cat + positive lookbehind (?<=dinosaur) dog +positive lookahead (?<=cat + positive lookahead (?<=dinosaur) dog +positive lookbehind (?<=cat + positive lookahead (?<=dinosaur) -- where if any of these = true, then return TRUE, else FALSE the filter command is far easier to build and debug, and is likely faster than the complex regex positive lookahead/behind algorithm Someone more conversant in regEx my show a better solution and be the better answer to your question. Jim Ault Las Vegas ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Matchtext to find a series of words
On 11/29/06 3:26 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote: > I need a matchtext/regex that will find a series of words in a block of > text, no matter whether they are together or not, and ignoring carriage > returns. For example: > > See if all of these words: dog cat dinosaur > > are in this text: > > "The purple dinosaur inadvertently stepped on the cat. > The white dog howled." > > Should return true. Is there such a thing? Well, you can do this, but there may be a more efficient way: put (matchText(tText,"(?si)\bdog\b") and \ matchText(tText,"(?si)\bcat\b") and matchText(tText,"(?si)\bdinosaur\b")) If I keep trying, maybe I can come up with a more efficient one-liner... Ken Ray Sons of Thunder Software, Inc. Web site: http://www.sonsothunder.com/ Email: [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Matchtext to find a series of words
I need a matchtext/regex that will find a series of words in a block of text, no matter whether they are together or not, and ignoring carriage returns. For example: See if all of these words: dog cat dinosaur are in this text: "The purple dinosaur inadvertently stepped on the cat. The white dog howled." Should return true. Is there such a thing? -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Matchtext to find a series of words
I need a matchtext/regex that will find a series of words in a block of text, no matter whether they are together or not, and ignoring carriage returns. For example: See if all of these words: dog cat dinosaur are in this text: "The purple dinosaur inadvertently stepped on the cat. The white dog howled." Should return true. Is there such a thing? -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution