"Jochen F. Rick" <[EMAIL PROTECTED]> wrote:
> Upgrade to a faster processor.
> 
> Searching is slow. If the ramdisk doesn't help, I don't think anything 
> (besides a massive code rewrite) will help.
> 

Maybe it's not so massive.  People have hacked things together in a day
or two in the past, and simply not followed through with releasable
code.

If you are willing to spend a couple of minutes at startup to build an
initial index, then you need no additional disk files.  That's nice!

Then, you need to adjust the index after each page save.  To adjust, you
remove all entries for the old version of the page and then add entries
for the new version:

        (self keywordsIn: page text) do: [ :word |
                (index at: word) remove: page ifAbsent: [] ].
        "update the page"
        (self keywordsIn: page text) do: [ :word |
                (index at: word ifAbsentPut: [ Set new ]) add: page ]


Looking up the keywords could be very simple to begin with, something
like:

        keywordsIn: aText
                ^aText findTokens: String separators, '.,<>/?=+!@#$%^&*()'''


Finally, the searching algorithm can be adjusted to use the nice index. 
Here's a simple one which may well be fast enough:

        pagesMatchingSearch: aQuery
                | keywords matchingPages |
                keywords := self findTokensIn: aQuery.
                matchingPages := pages asSet.  "start with all pages"
                keywords do: [ :keyword |
                        "intersect with pages matching this keyword"
                        matchingPages := matchingPages
                                intersection: (index at: keyword ifAbsent: [#()]) ].
                ^matchingPages



So this is just a sketch of how to do it, but it looks like it should be
posible if someone wants to blow a day or two putting it all together. 
Bonus points come for saving the index on disk so that swiki startup
remains fast, but that gets tricky in a hurry.  Maybe a sufficient way
would be to add an ordered shutdown for swikis (if it doesn't exist
already) and to save the index then; when swikis are started, the
on-disk file is deleted so that it doesn't mislead future invocations. 
Trying to keep the on-disk index up to date looks quite hard!

NOTE: if there is a complex in-memory data structure that involves
multiple pages, then threading issues will be very significant.  I think
we should single thread the processing of individual requests, anyway,
up until the data is ready to be spooled out, because on uni-processor
machines the processor will be slammed, anyway, while processing one
request.  It's the network I/O that we want to multiplex, and that part
happens mostly at the beginning and the end of a request/response
session.


-Lex

Reply via email to