I've just added a new script to the Library: skimp: an indexing
program:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=skimp.r
http://www.rebol.org/cgi-bin/cgiwrap/rebol/documentation.r?script=skimp.r
There's quite a back story to this that shows the strengths of the
REBOL community. Let me try to tell it in brief.....
MAILING LIST ARCHIVE
About three years ago, Graham was hosting the best rendition of
the Mailing List Archive on his personal website, but was strapped
for bandwidth. The Library team took it over from him and hosted
it at REBOL.org:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-index.r
As you can see from the announcement of that, one of the questions
was: when will it be searchable?
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlYDLQ
And my reply is that we wanted to do it the lazy way -- using the
Google SOAP API.
MAILING LIST SEARCH
However, months drifted by and Google did not index much of the
Mailing List. So I wrote a quick and dirty indexer:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-message.r?m=rmlGGWC
This turns out to have been a good move for several reasons: not
least because Google has since withdrawn their SOAP API.
WIDER SEARCH
We soon extended the quick and dirty utility; and it is now used
through out the Library for searching scripts, documentation, and
articles.
http://www.rebol.org/cgi-bin/cgiwrap/rebol/site-search.r
LIBRARY SPIN-OFFS
The Library team started a while back to try to extract some of
the technology that runs the Library, and release it for wider
use. You can see some of the products of that here:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/cpt-list-scripts.r?user-name=peterwood
and
http://www.rebol.org/cgi-bin/cgiwrap/rebol/cpt-list-scripts.r?user-name=sunanda
So about January this year, I was dusting off skimp (the quick and
dirty indexer) to release as a Library spin-off. But I wasn't
happy with if for two reasons:
1. The algorithms it used for handling long lists of usually
adjacent integers were distinctly hacked together;
2. Its definition of what a "word" is (crucial if you want to find
and index words) was loose and flabby; plus the code to handle
word extraction was also a hack.
To address the first issue, we threw out a challenge on the
Mailing List:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-message.r?m=rmlPCJC
The responses were astounding. We now have some world-class code
at the heart of skimp, and in the Library for anyone else to use:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=rse-ids.r
Peter Wood rose to the second challenge, teaching himself
ninja-level parse to do so. You can see the results of his work here:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=make-word-list.r
make-word-list is a general purpose utility for extracting words
from data, and is highly configurable too.
THE END RESULT
That left me with almost nothing to do but:
1. Retro-fit skimp to use rse-ids and make-word-list
2. Retro-fit the library to use the new version of skimp
3. Publish it in the Library.
Steps 1 and 3 are complete.
I'm a third of the way through Step 2 (the Articles search uses
the new version; the Scripts and Mailing List new versions will be
done in the next couple of weeks). The Articles search is nearly
twice as fast as the previous code. Dev versions of the website
show similar speed improvements for the other indexes too.
***
And that just leaves me to thank the REBOL Community again for
helping to create a general-purpose indexing utility. I hope you
can find many more uses for it -- I know I can :-)
Thanks all!
Sunanda
/// Peter test cases; three indexed sections.
--
To unsubscribe from the list, just send an email to
lists at rebol.com with unsubscribe as the subject.