Re: How to filter a big list

2009-10-19 Thread Andre Garzia
Bonjour Jérôme,

you could achieve better results by converting this list to a SQLite
database.

If you want to keep using strings for this, then I suppose you could use
some clever combination of sort by each to sort out things by specific
columns.

You can also use RegEx to find the chunks you want, but I believe it won't
be that fast.

And last but not least, if this is not a product but a tool for your use
only, or if you're only deploying on unix like systems, why not dumping this
to a text file and using the shell() with grep? That will be the fastest
solution.

andre


On Mon, Oct 19, 2009 at 6:49 PM, Jérôme Rosat  wrote:

> I wish to filter a list which contains approximately 300'000 lines. I try
> the "filter ... with" command. It's slow.
>
> I try the "Repeat for each" loop but it's slower.
>
> Is it possible to use the "filter ... with" command and to "force" RunRev
> to check only one "item" of the line and not the whole line with the
> "filterPattern" ?
>
> Thanks.
>
> Jerome
> ___
> use-revolution mailing list
> use-revolution@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



-- 
http://www.andregarzia.com All We Do Is Code.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-19 Thread Chris Sheffield

Jérôme,

Are you applying the filter to a text field? If so, try temporarily  
saving the field to a variable, applying the filter to the variable,  
then saving the variable back to the field. The speed should increase  
dramatically. If you're already doing this, I'm afraid I don't have  
another suggestion.


Chris


On Oct 19, 2009, at 2:49 PM, Jérôme Rosat wrote:

I wish to filter a list which contains approximately 300'000 lines.  
I try the "filter ... with" command. It's slow.


I try the "Repeat for each" loop but it's slower.

Is it possible to use the "filter ... with" command and to "force"  
RunRev to check only one "item" of the line and not the whole line  
with the "filterPattern" ?


Thanks.

Jerome
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


--
Chris Sheffield
Read Naturally, Inc.
www.readnaturally.com

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-19 Thread Sarah Reichelt
On Tue, Oct 20, 2009 at 6:49 AM, Jérôme Rosat  wrote:
> I wish to filter a list which contains approximately 300'000 lines. I try
> the "filter ... with" command. It's slow.
>
> I try the "Repeat for each" loop but it's slower.
>
> Is it possible to use the "filter ... with" command and to "force" RunRev to
> check only one "item" of the line and not the whole line with the
> "filterPattern" ?


What filter pattern are you using? I have found filter to be very fast
even with large lists, provided the filter pattern is simple.
e.g. filter with "*blue*" is fast but filter with "*" & tab & "*" &
tab & "*blue* will be slow.

If this is the sort of thing you are trying, then I suggest a
preliminary fast filter to get the overall length down, then maybe
changing to a "repeat for each" loop to refine the filter.
And as Chris said, make sure you are always operating on a variable,
not the contents of a field.

Cheers,
Sarah
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-19 Thread Jérôme Rosat

Thank you Sarah, Chris and Andre,

I use already a variable and my filter pattern is "*[' -]" & myString  
& "*".


I'm going to try with a SQLite database.


Le 19 oct. 2009 à 23:49, Sarah Reichelt a écrit :


On Tue, Oct 20, 2009 at 6:49 AM, Jérôme Rosat  wrote:
I wish to filter a list which contains approximately 300'000 lines.  
I try

the "filter ... with" command. It's slow.

I try the "Repeat for each" loop but it's slower.

Is it possible to use the "filter ... with" command and to "force"  
RunRev to

check only one "item" of the line and not the whole line with the
"filterPattern" ?



What filter pattern are you using? I have found filter to be very fast
even with large lists, provided the filter pattern is simple.
e.g. filter with "*blue*" is fast but filter with "*" & tab & "*" &
tab & "*blue* will be slow.

If this is the sort of thing you are trying, then I suggest a
preliminary fast filter to get the overall length down, then maybe
changing to a "repeat for each" loop to refine the filter.
And as Chris said, make sure you are always operating on a variable,
not the contents of a field.

Cheers,
Sarah
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-19 Thread Mark Wieder
Jérôme-

Monday, October 19, 2009, 3:51:30 PM, you wrote:

> I use already a variable and my filter pattern is "*[' -]" & myString
> & "*".

Can you post the code you're using? Do you have enough physical memory
to hold the entire list without page swapping?

-- 
-Mark Wieder
 mwie...@ahsoftware.net

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-19 Thread Brian Yennie
I would suspect that the leading "*" is causing the most slowdown.  
That is going to force the filter command to search the entire line  
every time, since the "['-]" portion could be anywhere.


Switching to SQLLite could work, but you are going to have to  
completely reformat your data. If you just throw your lines into  
records in a database, the problem will be the same.


Maybe there is a higher level solution, for example:

1) Move your searchable strings to the beginning of each line (filter  
on myString* will be much faster)


2) Sort your data and process it in chunks, showing the results as  
they come in (total time will be the same, but better user experience)


3) Try using lineOffset() to find one match at a time?

HTH


Thank you Sarah, Chris and Andre,

I use already a variable and my filter pattern is "*[' -]" &  
myString & "*".


I'm going to try with a SQLite database.


Le 19 oct. 2009 à 23:49, Sarah Reichelt a écrit :


On Tue, Oct 20, 2009 at 6:49 AM, Jérôme Rosat  wrote:
I wish to filter a list which contains approximately 300'000  
lines. I try

the "filter ... with" command. It's slow.

I try the "Repeat for each" loop but it's slower.

Is it possible to use the "filter ... with" command and to "force"  
RunRev to

check only one "item" of the line and not the whole line with the
"filterPattern" ?



What filter pattern are you using? I have found filter to be very  
fast

even with large lists, provided the filter pattern is simple.
e.g. filter with "*blue*" is fast but filter with "*" & tab & "*" &
tab & "*blue* will be slow.

If this is the sort of thing you are trying, then I suggest a
preliminary fast filter to get the overall length down, then maybe
changing to a "repeat for each" loop to refine the filter.
And as Chris said, make sure you are always operating on a variable,
not the contents of a field.

Cheers,

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-19 Thread Richard Gaskin

Jérôme Rosat wrote:
I wish to filter a list which contains approximately 300'000 lines. I  
try the "filter ... with" command. It's slow.


I try the "Repeat for each" loop but it's slower.

Is it possible to use the "filter ... with" command and to "force"  
RunRev to check only one "item" of the line and not the whole line  
with the "filterPattern" ?


RegEx is a complex subsystem designed for ease of use more than performance.

Depending on what you want to do, you may find that "repeat for each" 
will be your fastest option (at least until Alex Tweedly chimes in with 
a three-line solution using "split" ).


What does your data look like, and what are you looking for in it?

--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-19 Thread Jim Ault

On Oct 19, 2009, at 1:49 PM, Jérôme Rosat wrote:

I wish to filter a list which contains approximately 300'000 lines.  
I try the "filter ... with" command. It's slow.


I try the "Repeat for each" loop but it's slower.

Is it possible to use the "filter ... with" command and to "force"  
RunRev to check only one "item" of the line and not the whole line  
with the "filterPattern" ?






First, what do you mean by 'slow' ?'slower' ?

There are many items to consider in the optimization of filtering.
-1- do you create the 300,000 lines yourself or inherit them
-2- are the lines long strings or short (how many chartacters)
-3- are the lines structured or more like descriptions or phrases
-4- is the part to be filtered at the beginning or the end of each line
-5- there are numerous other considerations depending on the exact  
task at hand


Your request is far to vague to give definitive answers.
As Mark Wieder said, please post some example lines of the data and  
the code you are trying.  There are many innovative ways to use the  
Rev chunking functions that make sure you get speed without  
sacrificing accuracy ( false hits, false misses )


Looking forward to more details.
It is fun to consider the variations possibilities  :-)

Jim Ault
Las Vegas

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-21 Thread Jérôme Rosat

Thank you Jim, Richard, Brian and Mark,

Please excuse me to answer so tardily, I posted a message yesterday,  
but it was not published in the list. I make a new attempt.


I explained in my message that I wish to filter a list of names and  
addresses dynamically when I type a name in a field. This list  
contains 400'000 lines like this:  Mme [TAB] DOS SANTOS albertina  
[TAB] rue GOURGAS 23BIS [TAB] 1205 Genève


I made various tests using the "repeat for each" loop and the  
"filter ... with" command. Filtering takes the most time when I type  
the first and the second letter. That takes approximately 800  
milliseconds for the first char and about 570 milliseconds for the  
second char. The repeat loop with the "contains" operator is a little  
beat slower (about 50 milliseconds) than the "filter ... with". There  
is no significant difference when the third char or more is typed. Of  
course I filter a variable before to put it in the list field.


Obviously, 800 milliseconds to filter a list of 400'000 lines, it is  
fast. But it is too slow for what I want to do. It would take a time  
of filtering lower than 300 milliseconds so that the user is not  
slowed down in his typing.


Sorry to have been insufficiently precise in my first message. I  
continue my tests and I will publish the fastest code.


Jerome Rosat

Le 20 oct. 2009 à 03:41, Jim Ault a écrit :


First, what do you mean by 'slow' ?'slower' ?

There are many items to consider in the optimization of filtering.
-1- do you create the 300,000 lines yourself or inherit them
-2- are the lines long strings or short (how many chartacters)
-3- are the lines structured or more like descriptions or phrases
-4- is the part to be filtered at the beginning or the end of each  
line
-5- there are numerous other considerations depending on the exact  
task at hand


Your request is far to vague to give definitive answers.
As Mark Wieder said, please post some example lines of the data and  
the code you are trying.  There are many innovative ways to use the  
Rev chunking functions that make sure you get speed without  
sacrificing accuracy ( false hits, false misses )


Looking forward to more details.
It is fun to consider the variations possibilities  :-)

Jim Ault
Las Vegas


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-21 Thread Richard Gaskin

Jérôme Rosat wrote:
I explained in my message that I wish to filter a list of names and  
addresses dynamically when I type a name in a field. This list  
contains 400'000 lines like this:  Mme [TAB] DOS SANTOS albertina  
[TAB] rue GOURGAS 23BIS [TAB] 1205 Genève


I made various tests using the "repeat for each" loop and the  
"filter ... with" command. Filtering takes the most time when I type  
the first and the second letter. That takes approximately 800  
milliseconds for the first char and about 570 milliseconds for the  
second char. The repeat loop with the "contains" operator is a little  
beat slower (about 50 milliseconds) than the "filter ... with". There  
is no significant difference when the third char or more is typed. Of  
course I filter a variable before to put it in the list field.


Obviously, 800 milliseconds to filter a list of 400'000 lines, it is  
fast. But it is too slow for what I want to do. It would take a time  
of filtering lower than 300 milliseconds so that the user is not  
slowed down in his typing.


Would it be practical to break your list into 26 sublists by first letter?

--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-21 Thread David Glasgow


On 21 Oct 2009, at 6:00 pm, Jérôme Rosat wrotewrote:

Filtering takes the most time when I type  the first and the  
second letter. That takes approximately 800  milliseconds for the  
first char and about 570 milliseconds for the  second char. The  
repeat loop with the "contains" operator is a little  beat slower  
(about 50 milliseconds) than the "filter ... with". There  is no  
significant difference when the third char or more is typed. Of   
course I filter a variable before to put it in the list field


How about the filter with only kicks in after the second or third  
character is typed? The first two characters take a disproportionate  
amount of time, and probably don't reduce the list size substantially.


David Glasgow
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-21 Thread Jim Ault
your asking a lot of a chunking function to scan a large body of text  
between key strokes.


Start with the following steps to see if these help.

-1-  Showing a list of more than 50 hits may not be useful
-2-  Doing an filter operation with less than 3 chars may not be useful
-3-  Showing the number of lines (hits) at the top of the field is  
useful
-4-  Most likely you will need to pre-index the 400K lines to get more  
speed


Indexing is what data bases do to boost speed.  You need to decide  
what the logic is, such as any char in any string, or words beginning  
with the user input, etc.


Is the 400K set of lines dynamic or static?
Does the user type logical words, or phrases?
eg.  santos  -- single word
eg.  Gourgas  -- single word
eg.  dos santos  -- phrase in order

eg.  rue Gourgas  --phrase in order

If link tables are required, then you should consider a database,  
since this is something they do well.



 if the number of chars in userInput < 3 then exit to top

   put "Number of lines = " && \
   the number of lines in filteredBlock into theOutput


if the number of lines in filteredBlock > 50 then
  put  line 1 to 10 of block & cr & "MORE" after theOutput


The fewer characters in the block of lines to be filtered, the better.


Hope this helps.

Jim Ault
Las Vegas



On Oct 21, 2009, at 8:47 AM, Jérôme Rosat wrote:


Thank you Jim, Richard, Brian and Mark,

Please excuse me to answer so tardily, I posted a message yesterday,  
but it was not published in the list. I make a new attempt.


I explained in my message that I wish to filter a list of names and  
addresses dynamically when I type a name in a field. This list  
contains 400'000 lines like this:  Mme [TAB] DOS SANTOS albertina  
[TAB] rue GOURGAS 23BIS [TAB] 1205 Genève


I made various tests using the "repeat for each" loop and the  
"filter ... with" command. Filtering takes the most time when I type  
the first and the second letter. That takes approximately 800  
milliseconds for the first char and about 570 milliseconds for the  
second char. The repeat loop with the "contains" operator is a  
little beat slower (about 50 milliseconds) than the "filter ...  
with". There is no significant difference when the third char or  
more is typed. Of course I filter a variable before to put it in the  
list field.


Obviously, 800 milliseconds to filter a list of 400'000 lines, it is  
fast. But it is too slow for what I want to do. It would take a time  
of filtering lower than 300 milliseconds so that the user is not  
slowed down in his typing.


Sorry to have been insufficiently precise in my first message. I  
continue my tests and I will publish the fastest code.


Jerome Rosat

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-22 Thread Alex Tweedly

Richard Gaskin wrote:

Jérôme Rosat wrote:
I explained in my message that I wish to filter a list of names and  
addresses dynamically when I type a name in a field. This list  
contains 400'000 lines like this:  Mme [TAB] DOS SANTOS albertina  
[TAB] rue GOURGAS 23BIS [TAB] 1205 Genève


I made various tests using the "repeat for each" loop and the  
"filter ... with" command. Filtering takes the most time when I type  
the first and the second letter. That takes approximately 800  
milliseconds for the first char and about 570 milliseconds for the  
second char. The repeat loop with the "contains" operator is a 
little  beat slower (about 50 milliseconds) than the "filter ... 
with". There  is no significant difference when the third char or 
more is typed. Of  course I filter a variable before to put it in the 
list field.


Obviously, 800 milliseconds to filter a list of 400'000 lines, it is  
fast. But it is too slow for what I want to do. It would take a time  
of filtering lower than 300 milliseconds so that the user is not  
slowed down in his typing.


Would it be practical to break your list into 26 sublists by first 
letter?

That's a pragmatic approach - but I think it's the wrong one.

The fundamental problem is that the idea of scanning an entire list at 
keystroke speed is not robust. Even if splitting into multiple lists 
works for now, there's no guarantee that it will work tomorrow - when 
the database doubles in size, or the data becomes skewed because it 
contains too many people with the same first letter, or  or the 
users demand a similar feature for address as well as surname, or they 
want to match string anywhere within the name, or 


What you ought to do (imnsho) is to change the algorithm to one which is 
inherently responsive, using either 'send' or 'wait-with-messages' to 
ensure that this matching process does not interfere with 
responsiveness. In this case, I think it's easier to use wait-with-messages.


So in outline

each time the match data changes, you restart the matching process

the matching process checks a fixed, and relatively small, number of 
possible matches

 updates the field showing the user what matches have been found
 and then allows other things to happen before continuing with the 
matching.


I'd have a single handler that is always called when any changes happens 
to the user input, which can kick off a new matching process (by sending 
to the match handler). Then within the handler, I'd periodically check 
whether there is a pending message to restart a new handler.


So a brief version of the whole script would be


local sShow, sStart, sData, sFound,sMatch
global gData

on keyUp
   matchStringHasChanged
   pass keyUp
end keyUp

on matchStringHasChanged
   send "processamatch" to me in 0 millisecs
end matchStringHasChanged

on processamatch
   local tCount
   
   put gData into sData

   put the text of field "Field" into sMatch
   
   put empty into field "Show"

   put empty into sShow
   
   repeat for each line L in sData

  add 1 to tCount
  if L begins with sMatch then
 put L &CR after sShow
  end if
  if tCount mod 100 = 0 then
 put sShow & "." & CR into field "Show"
 wait 0 millisecs with messages
 if the pendingmessages contains ",processamatch," then
put "exiting" & CR after field "StatusLog"
exit processamatch
 end if
  end if
   end repeat
   put sShow into field "Show"
   put "Completed" && the number of lines in sShow &CR after field 
"StatusLog"

end processamatch




Note the use of ".." to give an indication that work is still in 
progress and there may be more matches to come.


You could easily add refinements to this

1a.  if a matching process has completed (rather than exited), and if 
previous match string was a substring of the new matchstring, then 
instead of starting with

 put gData into sData
you could instead do
 put sShow into sData 

(i.e. re-use the filtered list - but be sure to remember that if you 
exit before completing, or if the matchstring changes in any other way 
you need to restart with the whole of gData)


1b. If you do 1a, then if you are *nearly* complete with a match when 
the matchstring changes, then just go ahead and complete it, so you get 
to work on the subset.

(good luck deciding what *nearly* means :-)

btw - I don't think there is any magic 'split'-based method possible here.


-- Alex.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-22 Thread Alex Tweedly

Alex Tweedly wrote:
Note the use of ".." to give an indication that work is still in 
progress and there may be more matches to come.



Hmmm  that's a bit of a cheap way to do it.

Much better to put the number of lines in sData into tDataSize, and then do

 put 20 - (20 * tCount / tDataSize) into t
 put sShow & char 1 to t of "..." & CR 
into field "Show"
 

-- Alex.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread Jérôme Rosat

David,

Yes, it is right. But I have noticed that the display of the list in  
the field takes time. I made tests by displaying the result of the  
filtering only after having typed the third letter and it is enough  
fast for not slow down the typing of the user.


Jerome Rosat

Le 21 oct. 2009 à 20:07, David Glasgow a écrit :

How about the filter with only kicks in after the second or third  
character is typed? The first two characters take a disproportionate  
amount of time, and probably don't reduce the list size substantially.


David Glasgow
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread DunbarX
Yesterday I went back to an old HC stack that filtered just such a list. I 
tried a Rev version of it on a list of 400,000 lines and it seems to run 
reasonably responsively. If this has already been dealt with better, then just 
ignore...

There is a search field and another field named "results". I put the list 
in a global called "clientlist", so one could just go and type into the 
search field at any time. In the search field script:

on rawkeyUp tkey
if length(line 1 of me) < 4 then pass rawkeyUp
put line 1 of me into tofind   --lose errant returns
get revFullFind(clientList,toFind,"lineNum","false")
repeat with y = 1 to number(lines of it)
   put line line y of it of clientlist & return after temp1
end repeat
if temp1 = "" then put "No such" && quote & tofind & quote && "found" 
into fld "results"
else put temp1 into fld "results"
end rawkeyUp

function revFullFind theText,toFind,form,exactly

repeat for each line theline in thetext
   add 1 to counter
   if exactly = "true" then
  if toFind = theLine then
 if form = "lineNum" then x
put counter & return after temp22
 else   if form = "txt" then
put theLine & return after temp22
 end if
  end if
   else
  if toFind is in theLine then
 if form = "lineNum" then
put counter & return after temp22
 else   if form = "txt" then
put theLine & return after temp22
 end if
  end if
   end if
end repeat

return temp22
end revFullFind

"revFullFind" was one of my earliest Rev efforts; it duplicates much of 
Rinaldi's "fullFind". It gives here a return delimited list of line numbers of 
all occurrances of a string found in another string, which is then mapped to 
the riginal data.

Craig Newman
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread Jérôme Rosat

Jim,

Le 22 oct. 2009 à 05:41, Jim Ault a écrit :
your asking a lot of a chunking function to scan a large body of  
text between key strokes.

Start with the following steps to see if these help.
-1-  Showing a list of more than 50 hits may not be useful

Good idea.
-2-  Doing an filter operation with less than 3 chars may not be  
useful

Filter a list with less than 3 chars reduce the list.
-3-  Showing the number of lines (hits) at the top of the field is  
useful

Good idea.
-4-  Most likely you will need to pre-index the 400K lines to get  
more speed
Indexing is what data bases do to boost speed.  You need to decide  
what the logic is, such as any char in any string, or words  
beginning with the user input, etc.

I should think to this solution.

Is the 400K set of lines dynamic or static?

Static for my tests. Dynamic from 0 to about 400k lines in production.

Does the user type logical words, or phrases?

Both.

eg.  santos  -- single word
eg.  Gourgas  -- single word
eg.  dos santos  -- phrase in order
eg.  rue Gourgas  --phrase in order
If link tables are required, then you should consider a database,  
since this is something they do well.


Hope this helps.


Yes, thank you.

If some body in the list is looking for a file with names and address  
(400K lines), I create a file with french name, French, Spanish,  
Italians, Portuguese first name and address of Geneva in Switzerland.  
It is possible to download here: http://files.me.com/jrosat/tac1b4


In the state of my tests, here the my “fastest” code:

on keyUp
   set the itemdelimiter to tab
   Switch the number of chars of me
  Case 1
 put empty into vListe
 repeat for each line theLine in vNoms
if item 2 of theLine contains me then put theLine & cr  
after vListe

 end repeat
 break
  Case 2
 put vListe into maListe
 put empty into vListe
 repeat for each line theLine in maListe
if item 2 of theLine contains me then put theLine & cr  
after vListe

 end repeat
 break
  Default
 put vListe into maListe
 put empty into vListe
 repeat for each line theLine in maListe
if item 2 of theLine contains me then put theLine&cr  
after vListe

 end repeat
 delete the last char of vListe
 put "Number of lines =" && (the number of lines of vListe)  
into theOutput

if the number of lines in vListe > 60 then
put theOutput & cr & line 1 to 60 of vListe & cr & "..."  
into field "fListe"

 else
put theOutput & cr & vListe into field "fListe"
 end if
   end Switch
end keyUp

Jerome Rosat___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread Jérôme Rosat

Alex,

Thank you for your message. I'm not very at ease with messages and the  
command "send" but I will test your code. I think it is the good way  
to not slow down the typing of the user.


Jerome Rosat

Le 23 oct. 2009 à 00:33, Alex Tweedly a écrit :



Would it be practical to break your list into 26 sublists by first  
letter?

That's a pragmatic approach - but I think it's the wrong one.

The fundamental problem is that the idea of scanning an entire list  
at keystroke speed is not robust. Even if splitting into multiple  
lists works for now, there's no guarantee that it will work tomorrow  
- when the database doubles in size, or the data becomes skewed  
because it contains too many people with the same first letter,  
or  or the users demand a similar feature for address as well as  
surname, or they want to match string anywhere within the name,  
or 


What you ought to do (imnsho) is to change the algorithm to one  
which is inherently responsive, using either 'send' or 'wait-with- 
messages' to ensure that this matching process does not interfere  
with responsiveness. In this case, I think it's easier to use wait- 
with-messages.


So in outline

each time the match data changes, you restart the matching process

the matching process checks a fixed, and relatively small, number of  
possible matches

updates the field showing the user what matches have been found
and then allows other things to happen before continuing with  
the matching.


I'd have a single handler that is always called when any changes  
happens to the user input, which can kick off a new matching process  
(by sending to the match handler). Then within the handler, I'd  
periodically check whether there is a pending message to restart a  
new handler.


So a brief version of the whole script would be


local sShow, sStart, sData, sFound,sMatch
global gData

on keyUp
  matchStringHasChanged
  pass keyUp
end keyUp

on matchStringHasChanged
  send "processamatch" to me in 0 millisecs
end matchStringHasChanged

on processamatch
  local tCount
 put gData into sData
  put the text of field "Field" into sMatch
 put empty into field "Show"
  put empty into sShow
 repeat for each line L in sData
 add 1 to tCount
 if L begins with sMatch then
put L &CR after sShow
 end if
 if tCount mod 100 = 0 then
put sShow & "." & CR into field "Show"
wait 0 millisecs with messages
if the pendingmessages contains ",processamatch," then
   put "exiting" & CR after field "StatusLog"
   exit processamatch
end if
 end if
  end repeat
  put sShow into field "Show"
  put "Completed" && the number of lines in sShow &CR after field  
"StatusLog"

end processamatch




Note the use of ".." to give an indication that work is still in  
progress and there may be more matches to come.


You could easily add refinements to this

1a.  if a matching process has completed (rather than exited), and  
if previous match string was a substring of the new matchstring,  
then instead of starting with

put gData into sData
you could instead do
put sShow into sData
(i.e. re-use the filtered list - but be sure to remember that if you  
exit before completing, or if the matchstring changes in any other  
way you need to restart with the whole of gData)


1b. If you do 1a, then if you are *nearly* complete with a match  
when the matchstring changes, then just go ahead and complete it, so  
you get to work on the subset.

(good luck deciding what *nearly* means :-)

btw - I don't think there is any magic 'split'-based method possible  
here.



-- Alex.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread Richard Gaskin

Jérôme Rosat wrote:

Alex,

Thank you for your message. I'm not very at ease with messages and the  
command "send" but I will test your code. I think it is the good way  
to not slow down the typing of the user.


IMO well worth the time to try it.

Alex Tweedly regularly comes up with non-obvious but highly effective 
solutions.  His inventiveness has made many helpful contributions to 
algorithms explored on this list.


My only disappointment with his latest contribution is that it wasn't a 
three-liner as he usually comes up with.


Alex, you're slipping. :)

--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread Mark Wieder
Jérôme-

Friday, October 23, 2009, 8:51:18 AM, you wrote:

> If some body in the list is looking for a file with names and address
> (400K lines), I create a file with french name, French, Spanish,  
> Italians, Portuguese first name and address of Geneva in Switzerland.
> It is possible to download here: http://files.me.com/jrosat/tac1b4

> In the state of my tests, here the my “fastest” code:

Since you seem to want to limit the display to 60 lines, here's a much
faster version. This takes roughly 90 milliseconds to do an entire
list, irrespective of the number of chars. Note as well that your text
file is actually delimited by commas rather than tabs.

-- time of the filtered list
on keyUp
local vListe
local tTime

put the milliseconds into tTime
put test1(vNoms) into vListe
put the milliseconds - tTime into field "fElapsedTime"
put vListe into field "fListe"
end keyUp

function test1 pNoms
local vListe
local x

set the itemdelimiter to comma
put empty into vListe
put 1 into x
repeat for each line theLine in pNoms
if item 2 of theLine contains me then
put theLine & cr after vListe
add 1 to x
if x > 60 then
put "..." & cr after vListe
exit repeat
end if
end if
end repeat
return vListe
end test1

-- 
-Mark Wieder
 mwie...@ahsoftware.net

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread Jérôme Rosat

Mark,

My text file is a "raw" file. When I open a stack, I create a simpler  
tab list.


Your code is much better for the 1 to 3 chars. But, if the name you  
search is at the end of the list (for example "rosat") it takes more  
time than my code.


Jerome Rosat

Le 23 oct. 2009 à 19:13, Mark Wieder a écrit :


Jérôme-

Friday, October 23, 2009, 8:51:18 AM, you wrote:


If some body in the list is looking for a file with names and address
(400K lines), I create a file with french name, French, Spanish,
Italians, Portuguese first name and address of Geneva in Switzerland.
It is possible to download here: http://files.me.com/jrosat/tac1b4



In the state of my tests, here the my “fastest” code:


Since you seem to want to limit the display to 60 lines, here's a much
faster version. This takes roughly 90 milliseconds to do an entire
list, irrespective of the number of chars. Note as well that your text
file is actually delimited by commas rather than tabs.

-- time of the filtered list
on keyUp
   local vListe
   local tTime

   put the milliseconds into tTime
   put test1(vNoms) into vListe
   put the milliseconds - tTime into field "fElapsedTime"
   put vListe into field "fListe"
end keyUp

function test1 pNoms
   local vListe
   local x

   set the itemdelimiter to comma
   put empty into vListe
   put 1 into x
   repeat for each line theLine in pNoms
   if item 2 of theLine contains me then
   put theLine & cr after vListe
   add 1 to x
   if x > 60 then
   put "..." & cr after vListe
   exit repeat
   end if
   end if
   end repeat
   return vListe
end test1

--
-Mark Wieder
mwie...@ahsoftware.net

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: How to filter a big list

2009-10-23 Thread Mark Wieder
Jérôme-

Friday, October 23, 2009, 10:55:26 AM, you wrote:

> Your code is much better for the 1 to 3 chars. But, if the name you
> search is at the end of the list (for example "rosat") it takes more
> time than my code.

Well, of course. There's no "rosat" in the data set, so you have to go
through the entire thing 400,000 lines than limiting it to 60 hits. In
that case yours is faster because you've already trimmed your data set
down by the first three chars. But you lose that advantage the first
three times through.

You might try something like

put test1(vNoms) into vListe
-- add this:
if the number of lines in vListe < 60 then
put vListe into vNoms
end if

so that you only go through the whole list once.

Search optimization would require foreknowledge of the data set.

-- 
-Mark Wieder
 mwie...@ahsoftware.net

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution