Re: Matchtext to find a series of words

2006-11-29 Thread Jim Ault
> I meant a single pass for each block. The filter solution has to make a
> new pass through the text for each word we want to filter on. But
> regardless, it still shows very well in my tests.

Actually, the filter pass does not have to make a new pass through *ALL* of
the text.

Since I wrote it to work on the same variable successively, if the first
filter command does not find a match, the next two work on an empty
variable.  Quite fast.

If it does find matching lines, the successive commands work only on the hit
text, thus optimizing by elimination.

The last email I sent shows that if you make each block a single line by
replacing the cr's, then concatenating the next block, you can make a single
pass for each word for all blocks at ONCE.  If there are no matches for the
first word, then the following words are filtering an empty variable.

By tagging each line with a header as you concatenate, you can even tell
which lines (blocks) meet all the criteria without any speed difference
since the residual variable will contain only hits.

The slowest would obviously be the 'all three words found in all the blocks'
scenario.

Glad you are having fun

Jim Ault
Las Vegas


On 11/29/06 6:02 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote:

> Jim Ault wrote:
>> On 11/29/06 3:37 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote:
>>> This looks promising, thanks. It looks like there is no single-pass
>>> method, but since filter is pretty fast it may do okay. I didn't even
>>> quote your regex explanation, I don't want to touch it. :)
>> 
>> You mention single pass...
>> Question: Single pass of what?
>> Single pass of each text block or all text blocks together?
> 



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread J. Landman Gay

Jim Ault wrote:

On 11/29/06 3:37 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote:

This looks promising, thanks. It looks like there is no single-pass
method, but since filter is pretty fast it may do okay. I didn't even
quote your regex explanation, I don't want to touch it. :)


You mention single pass...
Question: Single pass of what?
Single pass of each text block or all text blocks together?


I meant a single pass for each block. The filter solution has to make a 
new pass through the text for each word we want to filter on. But 
regardless, it still shows very well in my tests.


--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread Mark Smith

On 30 Nov 2006, at 00:31, Brian Yennie wrote:

You just need to pass through the text once, and "cross off" each  
word as you find it. If everything is crossed off when you're done,  
then you're done =).


That's a much better idea than mine, so:

function aMatch pWords,tText
  -- first remove punctuation marks from the word list, perhaps  
unneccessary

  repeat for each char C in pWords
if C is cr OR charToNum(C) >= 65 then put C after tWords
  end repeat

  repeat for each word W in tText
put empty into newWord

repeat for each char C in W
  get charToNum(C)
  if it >= 65 AND it <= 122 then put C after newWord
end repeat

if newWord is among the lines of tWords then
  filter tWords without newWord
end if

if tWords is empty then return true
  end repeat

  return false
end aMatch
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread Dick Kriesel
On 11/29/06 4:31 PM, "Brian Yennie" <[EMAIL PROTECTED]> wrote:

> I do think that algorithmically
> one-pass is definitely possible. You just need to pass through the
> text once, and "cross off" each word as you find it. If everything is
> crossed off when you're done, then you're done =).

Good idea, Brian.

-- Dick

on mouseUp
  put "The purple dinosaur inadvertently stepped on the cat." & cr \
& "The white dog howled." into tText
  put "dog dinosaur cat" into tWords
  put textContainsAllWords(tText,tWords)
end mouseUp

function textContainsAllWords tText,tWords
  replace "." with space in tText
  replace "," with space in tText
  split tText using space and space
  split tWords using space and space
  repeat for each key tWord in tText
delete variable tWords[tWord]
  end repeat
  return the keys of tWords is empty
end textContainsAllWords


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread Jim Ault
On 11/29/06 3:37 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote:
> This looks promising, thanks. It looks like there is no single-pass
> method, but since filter is pretty fast it may do okay. I didn't even
> quote your regex explanation, I don't want to touch it. :)

You mention single pass...
Question: Single pass of what?
Single pass of each text block or all text blocks together?

Doing all as one block 
--with tracing to know which are matches

put 0 into cnt
repeat for each line LNN in variableList
   add 1 to cnt
   do "get "&LNN
  replace cr with tab in it
  put cnt & LNN && it & cr after newBlock
end repeat
--now all the blocks are their own line in the aggregate

put allWordsPresent(newBlock, wordList) into residualBlock

if residualBlock is empty then
   put "no matches anywhere"
else
--word 1 of each line =
   --   (the variable number & variable name)
   --by concatenating it is unlikely they will form a match to one of your
search words or tokens
end if

-
>> function allWordsPresent textStr, wordList
>>   replace cr with tab in textStr
>>   set the wholematches to true
>>   repeat for each word WRD in wordList
>> filter textStr with ("*" & WRD & "*")
>>   end repeat
>>   return not (textStr is empty)
>> end  allWordsPresent


Jim Ault
Las Vegas


> Jim Ault wrote:
> 
>> I would tackle this using the filter command
>> 
>> replace cr with tab in textStr
>> set the wholematches to true
>> filter textStr with "*"& token1&"*"
>> filter textStr with "*"& token2&"*"
>> filter textStr with "*"& token3&"*"
>> if textStr  is empty then return false
>> else return true
>> 
>> A better form would be
>> 
>> function allWordsPresent textStr, wordList
>>   replace cr with tab in textStr
>>   set the wholematches to true
>>   repeat for each word WRD in wordList
>> filter textStr with ("*" & WRD & "*")
>>   end repeat
>>   return not (textStr is empty)
>> end  allWordsPresent
>


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread Brian Yennie
This looks promising, thanks. It looks like there is no single-pass  
method, but since filter is pretty fast it may do okay.


Not sure how robust my stab was, but I do think that algorithmically  
one-pass is definitely possible. You just need to pass through the  
text once, and "cross off" each word as you find it. If everything is  
crossed off when you're done, then you're done =).


HTH

- Brian
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread J. Landman Gay

Jim Ault wrote:


I would tackle this using the filter command

replace cr with tab in textStr
set the wholematches to true
filter textStr with "*"& token1&"*"
filter textStr with "*"& token2&"*"
filter textStr with "*"& token3&"*"
if textStr  is empty then return false
else return true

A better form would be

function allWordsPresent textStr, wordList
  replace cr with tab in textStr
  set the wholematches to true
  repeat for each word WRD in wordList
filter textStr with ("*" & WRD & "*")
  end repeat
  return not (textStr is empty)
end  allWordsPresent



This looks promising, thanks. It looks like there is no single-pass 
method, but since filter is pretty fast it may do okay. I didn't even 
quote your regex explanation, I don't want to touch it. :)


--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread Jim Ault

On 11/29/06 1:26 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote:

> I need a matchtext/regex that will find a series of words in a block of
> text, no matter whether they are together or not, and ignoring carriage
> returns. For example:
> 
> See if all of these words: dog cat dinosaur
> 
> are in this text:
> 
> "The purple dinosaur inadvertently stepped on the cat.
> The white dog howled."
> 
> Should return true. Is there such a thing?

I would tackle this using the filter command

replace cr with tab in textStr
set the wholematches to true
filter textStr with "*"& token1&"*"
filter textStr with "*"& token2&"*"
filter textStr with "*"& token3&"*"
if textStr  is empty then return false
else return true

A better form would be

function allWordsPresent textStr, wordList
  replace cr with tab in textStr
  set the wholematches to true
  repeat for each word WRD in wordList
filter textStr with ("*" & WRD & "*")
  end repeat
  return not (textStr is empty)
end  allWordsPresent


regEx would be as follows

the OR condition is \b(dog|cat|dinosaur)\b
--where the \b says 'word boundary' to regEx

the AND condition
 (?(?=condition)(then1|then2|then3)|(else1|else2|else3))
--major drawback is that you would have to structure the exact number of
words to check [you used 3 in your example] and also be scanned multiple
times 9starting with the hit fo 'dog') since you would be trying 4
combinations.  RegEx would stop looking as soon as one of these tested TRUE.
dog
   +positive lookbehind (?<=cat
+ positive lookbehind (?<=dinosaur)
dog
   +positive lookahead (?<=cat
+ positive lookbehind (?<=dinosaur)
dog
   +positive lookahead (?<=cat
+ positive lookahead (?<=dinosaur)
dog
   +positive lookbehind (?<=cat
+ positive lookahead (?<=dinosaur)

-- where if any of these = true, then return TRUE, else FALSE


 the filter command is far easier to build and debug, and is likely faster
than the complex regex positive lookahead/behind algorithm

Someone more conversant in regEx my show a better solution and be the better
answer to your question.

Jim Ault
Las Vegas


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Matchtext to find a series of words

2006-11-29 Thread Ken Ray
On 11/29/06 3:26 PM, "J. Landman Gay" <[EMAIL PROTECTED]> wrote:

> I need a matchtext/regex that will find a series of words in a block of
> text, no matter whether they are together or not, and ignoring carriage
> returns. For example:
> 
> See if all of these words: dog cat dinosaur
> 
> are in this text:
> 
> "The purple dinosaur inadvertently stepped on the cat.
> The white dog howled."
> 
> Should return true. Is there such a thing?

Well, you can do this, but there may be a more efficient way:

  put (matchText(tText,"(?si)\bdog\b") and \
matchText(tText,"(?si)\bcat\b") and
matchText(tText,"(?si)\bdinosaur\b"))

If I keep trying, maybe I can come up with a more efficient one-liner...

Ken Ray
Sons of Thunder Software, Inc.
Web site: http://www.sonsothunder.com/
Email: [EMAIL PROTECTED]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Matchtext to find a series of words

2006-11-29 Thread J. Landman Gay

I need a matchtext/regex that will find a series of words in a block of
text, no matter whether they are together or not, and ignoring carriage
returns. For example:

See if all of these words: dog cat dinosaur

are in this text:

"The purple dinosaur inadvertently stepped on the cat.
The white dog howled."

Should return true. Is there such a thing?

--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Matchtext to find a series of words

2006-11-29 Thread J. Landman Gay
I need a matchtext/regex that will find a series of words in a block of 
text, no matter whether they are together or not, and ignoring carriage 
returns. For example:


See if all of these words: dog cat dinosaur

are in this text:

"The purple dinosaur inadvertently stepped on the cat.
The white dog howled."

Should return true. Is there such a thing?

--
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution