I've just added a new script to the Library: skimp: an indexing 
program:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=skimp.r
http://www.rebol.org/cgi-bin/cgiwrap/rebol/documentation.r?script=skimp.r

There's quite a back story to this that shows the strengths of the 
REBOL community. Let me try to tell it in brief.....

                                        
MAILING LIST ARCHIVE
About three years ago, Graham was hosting the best rendition of 
the Mailing List Archive on his personal website, but was strapped 
for bandwidth. The Library team took it over from him and hosted 
it at REBOL.org:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-index.r

As you can see from the announcement of that, one of the questions 
was: when will it be searchable?
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlYDLQ

And my reply is that we wanted to do it the lazy way -- using the 
Google SOAP API.


MAILING LIST SEARCH
However, months drifted by and Google did not index much of the 
Mailing List. So I wrote a quick and dirty indexer:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-message.r?m=rmlGGWC

This turns out to have been a good move for several reasons: not 
least because Google has since withdrawn their SOAP API.


WIDER SEARCH
We soon extended the quick and dirty utility; and it is now used 
through out the Library for searching scripts, documentation, and 
articles.
http://www.rebol.org/cgi-bin/cgiwrap/rebol/site-search.r


LIBRARY SPIN-OFFS
The Library team started a while back to try to extract some of 
the technology that runs the Library,  and release it for wider 
use. You can see some of the products of that here:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/cpt-list-scripts.r?user-name=peterwood
and
http://www.rebol.org/cgi-bin/cgiwrap/rebol/cpt-list-scripts.r?user-name=sunanda

So about January this year, I was dusting off skimp (the quick and 
dirty indexer) to release as a Library spin-off. But I wasn't 
happy with if for two reasons:

1. The algorithms it used for handling long lists of usually 
adjacent integers were distinctly hacked together;

2. Its definition of what a "word" is (crucial if you want to find 
and index words) was loose and flabby; plus the code to handle 
word extraction was also a hack.

To address the first issue, we threw out a challenge on the 
Mailing List:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-message.r?m=rmlPCJC
The responses were astounding. We now have some world-class code 
at the heart of skimp, and in the Library for anyone else to use:               
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=rse-ids.r

Peter Wood rose to the second challenge, teaching himself 
ninja-level parse to do so. You can see the results of his work here:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=make-word-list.r
make-word-list is a general purpose utility for extracting words 
from data, and is highly configurable too.


THE END RESULT
That left me with almost nothing to do but:
1. Retro-fit skimp to use rse-ids and make-word-list
2. Retro-fit the library to use the new version of skimp
3. Publish it in the Library.

Steps 1 and 3 are complete.

I'm a third of the way through Step 2 (the Articles search uses 
the new version; the Scripts and Mailing List new versions will be 
done in the next couple of weeks). The Articles search is nearly 
twice as fast as the previous code. Dev versions of the website 
show similar speed improvements for the other indexes too.


***


And that just leaves me to thank the REBOL Community again for 
helping to create a general-purpose indexing utility. I hope you 
can find many more uses for it -- I know I can :-)

Thanks all!

Sunanda




/// Peter test cases; three indexed sections.
-- 
To unsubscribe from the list, just send an email to 
lists at rebol.com with unsubscribe as the subject.

Reply via email to